Distributed Data Parallel training with 2 GPUs fails with Flux.jl on AMD GPUs

Hi!

Unlikely that it will help, but can you try with Zygote@0.7.4?

Surprisingly, the NaN during the training also disappears when I print the optimizer’s state after Optimisers.Descent .

This looks like a synchronization issue. Can you also add explicit AMDGPU.synchronize(), instead of printing

2 Likes