Hi!
Unlikely that it will help, but can you try with Zygote@0.7.4?
Surprisingly, the NaN during the training also disappears when I print the optimizer’s state after
Optimisers.Descent.
This looks like a synchronization issue. Can you also add explicit AMDGPU.synchronize(), instead of printing