How to move optimiser from gpu to cpu?

weiqi · October 12, 2021, 1:38pm

Currently, cpu(optimiser) won’t move it. For instance, the state still consists of variables in GPU.

CarloLucibello · October 12, 2021, 2:40pm

It should work adding Flux.@functor ADAM (or whatever optimizer you are using) to your code. You should open an issue in Flux.jl stating your use case to see if it is worth adding this feature

weiqi · October 12, 2021, 4:27pm

Flux.@functor ADAM doesn’t work.
I thought this is a quite basic functionality as we need to restart the training for large-scale learning. Unless we stay with small problems that can easily finished within couple of hours.
And without loading previously saved optimizer, the training can not be restarted properly.

findmyway · October 12, 2021, 4:45pm

Like Dhairya answered on slack have you tried Optimizers.jl ? Check out the tests for usages.

weiqi · October 12, 2021, 5:31pm

I tried but didn’t figure out how to use it with Flux models? The tested examples are not for neural network models.

DrChainsaw · October 12, 2021, 11:23pm

Getting what you want here might require a bit of extra effort.

Flux’s current optimizers use IdDicts to map weights to optimizer state and when you move parameters to and from the gpu you create new copies. Result is that the IdDict will not recoginize them as the same weighs and instead you have a memory-leak like (weight-leak?) situation.

Depending on what method you use for storing the models and state you might end up with the same problem here before even moving anything (e.g. that weights in optimizer are no longer the same objects as the weights in the model).

I haven’t followed the development of the new optimizers very carefully, but I suppose both new and current optimizers would require you to manually compare weight values (hoping that there are no duplicates) or use some other way to identify the weights and then remap weigths to optimizer state.

weiqi · October 12, 2021, 11:38pm

@DrChainsaw Wonderful insights. Indeed, manually re-mapping the weights and optimizer state is just too much work. The time spent will enable me to re-implement everything in PyTorch

To get around the issue, I guess the Flux optimizer has to store the optimizer state in a string key rather than use the CUDA matrix as a key, which is so unreliable. For example

gpu(cpu(a)) will not be a anymore when a is a CUDA array.

ToucheSir · October 12, 2021, 11:47pm

The new optimizers work off of Zygote’s support for structural gradients. That is, you get a nested (named)tuple back which has the same structure as your model. For those who’ve used JAX-based libraries recently, this may look familiar to you (likewise for state_dict in PyTorch). You can try out Optimisers.jl today, and there should be 0 IdDicts stored anywhere when you use it

weiqi · October 12, 2021, 11:54pm

How to work with Flux models, like Dense, in optimiser.jl? Any example code will be appreciated.

ToucheSir · October 13, 2021, 12:37am

Currently we need a bit more internal plumbing (Optimisers.jl is still experimental) to get most Flux layers working OOTB. https://github.com/FluxML/Optimisers.jl/issues/26 has a good summary there. In the meantime, you can try something like these (warning: untested!) functions:

# change opt type and IdDict field name for whatever you're using
function extract_opt_state(opt::ADAM, model)
    func = Flux.Functors.children(model)
    map(func) do child
        if Flux.isleaf(child)
            get(opt.state, child, nothing)
        else
            extract_opt_state(opt, child)
        end
    end
end

function restore_opt_state!(opt::ADAM, model, state)
    func = Flux.Functors.children(model)
    map(func, state) do child, st
        if Flux.isleaf(child) && st !== nothing
            opt.state[child] = st
        else
            restore_opt_state!(opt, child, st)
        end
    end
end

ToucheSir · December 12, 2021, 6:29pm

So it turns out there is an easier way to go about this, see Deepcopy Flux Model - #9 by ToucheSir.

Topic		Replies	Views
Flux.jl: Save model and optimizer from gpu Machine Learning question	1	445	November 9, 2022
Training with Flux.jl on the GPU causes ArgumentError: cannot take the CPU address of a CuArray GPU question , gpu , flux , machine-learning , neural-network	4	1098	May 28, 2022
Flux.jl: training fails at GPU but works on CPU Machine Learning gpu , flux	1	630	September 19, 2019
Flux: Hard to use train! and DataLoader for minibatched NamedTuple dataset with GPU Machine Learning flux	2	1434	September 24, 2020
Flux on GPU too slow Machine Learning gpu , cuda , flux	5	1111	September 22, 2022

How to move optimiser from gpu to cpu?

Related topics