This is a sample reference model for Flax/Jax training using only on the MC4. It is trained for roughly three day on a TPU v3-8. Training procedure...
My description