Flux: reproducibility of GPU experiments

Rasmus_Hoier · May 30, 2021, 2:01pm

When I run the Flux model-zoo’s conv_mnist.jl example there is an argument “seed” which should assure reproducibility (when larger than 0 ). This seems to only work when performing experiments on CPU, whereas GPU experiments give slightly different results.

When running on GPU epoch 0 losses and accuracies are identical between runs. However, as training progresses the losses evolve slightly differently between runs. It seems somehow the training process is non-deterministic when running on GPU. I am very curious if Flux has a way to get reproducible GPU results?

I googled around and found that something similar happens in pytorch because cudnn may choose different implementations of algorithms between runs. Running “torch.use_deterministic_algorithms(True)” removes this source of indeterminism in pytorch. Is there a similar fix in Flux?

ToucheSir · May 30, 2021, 5:41pm

As I understand it, most of the non-determinism comes from cuDNN. I’ve opened Option for filtering by CUDNN_DETERMINISTIC in `cudnnConvolutionAlgoPerfChoose` · Issue #938 · JuliaGPU/CUDA.jl · GitHub to track adding the right machinery to force it to be deterministic.

Topic		Replies	Views
Unreliable computations on GPU with Flux General Usage gpu , cuda , flux	1	349	December 15, 2022
Flux.jl: training fails at GPU but works on CPU Machine Learning gpu , flux	1	647	September 19, 2019
Flux training GPU vs CPU different results Machine Learning gpu , flux	7	1205	August 20, 2020
Running MNIST from model-zoo on Flux v0.10.0/Zygote/GPU Machine Learning	2	655	December 30, 2019
Unit tests with random seed: local vs Travis General Usage question	14	1509	July 26, 2017

Flux: reproducibility of GPU experiments

Related topics