When I run the Flux model-zoo’s conv_mnist.jl example there is an argument “seed” which should assure reproducibility (when larger than 0 ). This seems to only work when performing experiments on CPU, whereas GPU experiments give slightly different results.
When running on GPU epoch 0 losses and accuracies are identical between runs. However, as training progresses the losses evolve slightly differently between runs. It seems somehow the training process is non-deterministic when running on GPU. I am very curious if Flux has a way to get reproducible GPU results?
I googled around and found that something similar happens in pytorch because cudnn may choose different implementations of algorithms between runs. Running “torch.use_deterministic_algorithms(True)” removes this source of indeterminism in pytorch. Is there a similar fix in Flux?