Flux: reproducibility of GPU experiments

When I run the Flux model-zoo’s conv_mnist.jl example there is an argument “seed” which should assure reproducibility (when larger than 0 ). This seems to only work when performing experiments on CPU, whereas GPU experiments give slightly different results.

When running on GPU epoch 0 losses and accuracies are identical between runs. However, as training progresses the losses evolve slightly differently between runs. It seems somehow the training process is non-deterministic when running on GPU. I am very curious if Flux has a way to get reproducible GPU results?

I googled around and found that something similar happens in pytorch because cudnn may choose different implementations of algorithms between runs. Running “torch.use_deterministic_algorithms(True)” removes this source of indeterminism in pytorch. Is there a similar fix in Flux?

1 Like

As I understand it, most of the non-determinism comes from cuDNN. I’ve opened Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` · Issue #938 · JuliaGPU/CUDA.jl · GitHub to track adding the right machinery to force it to be deterministic.

2 Likes