Different behaviour between Flux.jl and Pytorch

ToucheSir · February 9, 2021, 6:29pm

Yes, use Flux.gradient or pullback instead of train! so that you can analyze the gradients and model parameters before each update.

Another point to consider is that Flux and PyTorch initialize Dense layers differently by default. See Initializing Flux weights the same as PyTorch? - #4 by DevJac. If you can verify that a) the initializations are similar, b) the outputs from each intermediate step of the forward pass are similar, and c) the gradients are similar, then I think the behaviour shouldn’t be much different between PyTorch and Flux. If there is a bug (which seems somewhat unlikely since you’re using a plain MLP on CPU), I would imagine it’s somewhere in the backwards pass (and thus will show up in the gradients).

Topic		Replies	Views
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3096	December 18, 2019
Flux.jl RNN performance Machine Learning	11	3151	October 28, 2018
Params not getting updated during training New to Julia flux	25	1751	October 11, 2020
Attempted to port an RL algorithm from PyTorch to Flux and it's 10x slower Machine Learning	17	1390	January 29, 2021
[ANN] Flux v0.10 Machine Learning	36	5404	February 4, 2020

Different behaviour between Flux.jl and Pytorch

Related topics