What exactly Flux.destructure does?

Avec · March 4, 2022, 2:02pm

Hi all,

I built a custom nn model UnetSkipGenerator, trained it and needed to save the results, therefore I used Flux.destructure and saved the parameters it returned. Now I am trying to do some computations with them, but I noticed a behaviour that I cannot explain. When I try to copy values of parameters of one UnetSkipGenerator to another UnetSkipGenerator, as follows, it woks just fine.

Gxt1_nn = UnetSkipGenerator(3, 8, 128, 16, upsz)|>gpu
parst, nnt = Flux.destructure(Gxt1_nn)
Gxt1 = nnt(parst)
Gxt2 = UnetSkipGenerator(3, 8, 128, 16, upsz)|>gpu
h1 = Flux.params(Gxt1)
h2 = Flux.params(Gxt2)
for j = 1 : length(h1)
    h1[j] .= bezier_phi_theta(0.0f0, θx1[j], h2[j], w2[2][j]) 
end

function bezier_phi_theta(t, θb, w1b, w2b) 
    return (1-t)^2*w1b + 2*t*(1-t)*θb + t^2*w2b
end

(Note: new value of h1 does not depend on θx1 nor w2[2], because t = 0.0)
But when I initialize Gxt1 with the saved parameters, the networks return different values. I am pretty sure that the previously trained network was initialized the same way.
How I load the saved parameters:

res = matread("file_with_saved_params.mat")
θx = copy(res["theta"]) |>gpu  #saved parametrs

Any suggestion on where the problem could be or where to look? Thanks in advance.

EDIT:
I will just add some other manipution I am doing with the loaded data. I don’t think that it influences the bahaviour somehow, but I always need them to run the upper piece of code.

Gx = UnetSkipGenerator(3, 8, 128, 16, upsz)|>gpu
θ, nn = Flux.destructure(Gx)|>gpu
res = matread("file_with_saved_params.mat")
θx = copy(res["theta"])
res2 = matread("file_with_saved_params2.mat")
θx2 = copy(res2["theta"])
Gxa = nn(θx)|>gpu
Gxb = nn(θx2)|>gpu

Gx1 = UnetSkipGenerator(3, 8, 128, 16, upsz)|>gpu
θx1 = Flux.params(Gx1)
θa = Flux.params(Gxa)
θb = Flux.params(Gxb)
w1 = [copy(pka), deepcopy(θa)] #pk comes from different network
w2 = [copy(pkb), deepcopy(θb)]

albheim · March 4, 2022, 2:40pm

You are creating Gxt1 from the saved params which then is put in h1, while h2 is random values? In bezier_phi_theta you assign values from h2 to h1, so to me it seems like you are overwriting the read values with the random initialization of Gxt2?

I feel like I might be missing something here.

Avec · March 4, 2022, 2:57pm

My original problem is more complicated than this and this short piece of code does not really make sense, but it does what I write about. h2 is randomly initialized as well as parst. I am overwriting h1 with h2, but it should be just the values and the indices do not depend on each other, so I do not think that I overwrite something I would be using later.

I have the same feeling, but I just can’t find what exactly is missing.

EDIT: I checked if the values of θx aren’t rewritten, but they seem to be correct.

ToucheSir · March 4, 2022, 5:22pm

Unfortunately the MWE here is missing a W (notably the entire definition of UnetSkipGenerator), so nobody can replicate this. I’m also confused to why the parameters are being saved as .mat files. Have you considered just using JLD2 or some other Julia-aware serialization library instead? That could save a number of headaches around destructure itself.

Avec · March 4, 2022, 8:41pm

I know that it is not reproducible, that is why I am asking for a suggestions of what to look at. Do you think that it could be caused by the structure of UnetSkipGenerator? It is unet with skip connections, basically a Chain of Convs, BatchNorms and leakyrelus with Upsample(:bilinear). I do not really understand what is saved in the network function destructure returns.
Saving to .mat files is for my own comfort, I have some original data in .mat files and every once in a while I check some custom functions with functions in matlab. So far the saved parameters give results I expect, so I do not think that there is something wrong with using .mat files.

ToucheSir · March 5, 2022, 1:27am

destructure pulls out all the numeric arrays from a nested model structure and flattens them out into a single vector. The second arg returned is a function that undoes this transformation and tries (operative word) to return something matching the original model structure. This unfortunately can be quite error-prone for certain model configurations and layer types, which is why it’d help to see how all the bits of UnetSkipGenerator are defined.

That’s fine, but generally speaking destructure should be a last resort for serialization given the caveats mentioned above. So I would make sure you can reproduce this issue without using it first and then bring that back in.

Avec · March 15, 2022, 8:52am

Just to close this up, the problem was caused by nontrainable parameters in BatchNorm. Flux.params contains only trainable parameters, so the nontrainbale ones were not copied.

Topic		Replies	Views
Why is Flux.destructure type unstable? Machine Learning package , flux	6	347	March 25, 2024
Can I reconstruct a network using Flux.params in Flux.jl? Machine Learning flux , machine-learning , neural-network	2	652	November 20, 2022
Hypernetworks using Flux.destructure? Machine Learning flux , zygote	10	808	July 11, 2022
How to create a Flux model from parameter vector? New to Julia flux	3	690	January 6, 2021
On the future of Flux.destructure and SciML integration Machine Learning flux , sciml	4	1030	October 9, 2023

What exactly Flux.destructure does?

Related topics