Saving and loading model with Flux.jl

I want to save and load trained model with jld2. The model is just getStarted example from Flux documentation, and it works perfectly fine, without saving and loading step, but I want to plot the data with previously trained model.

Here’s the code:

FluxNeuron.jl(main module):

module FluxNeuron

# This will prompt if neccessary to install everything, including CUDA:
using Flux, CUDA, Statistics, ProgressMeter

    function trainModel(noisy=rand(Float32, 2, 1000), steps=1_000)
        # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:                             # 2×1000 Matrix{Float32}
        truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)]   # 1000-element Vector{Bool}
        # Define our model, a multi-layer perceptron with one hidden layer of size 3:
        model = Chain(
            Dense(2 => 3, tanh),   # activation function inside layer
            BatchNorm(3),
            Dense(3 => 2)) |> gpu        # move model to GPU, if available

        # The model encapsulates parameters, randomly initialised. Its initial output is:
        out1 = model(noisy |> gpu) |> cpu                                 # 2×1000 Matrix{Float32}
        probs1 = softmax(out1)      # normalise to get probabilities

        # To train the model, we use batches of 64 samples, and one-hot encoding:
        target = Flux.onehotbatch(truth, [true, false])                   # 2×1000 OneHotMatrix
        loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);
        # 16-element DataLoader with first element: (2×64 Matrix{Float32}, 2×64 OneHotMatrix)

        optim = Flux.setup(Flux.Adam(0.01), model)  # will store optimiser momentum, etc.

        # Training loop, using the whole data set 1000 times:
        losses = []
        @showprogress for epoch in 1:steps
            for (x, y) in loader
                loss, grads = Flux.withgradient(model) do m
                    # Evaluate model and loss inside gradient context:
                    y_hat = m(x)
                    Flux.logitcrossentropy(y_hat, y)
                end
                Flux.update!(optim, model, grads[1])
                push!(losses, loss)  # logging, outside gradient context
            end
        end
        return model
    end
    export trainModel


end # module FluxNeuron

SaveModel.jl(script run to save the model):

using JLD2, FluxNeuron, Flux
println("Number of steps(default = 1000): ")

try
    global steps = parse(Int, readline())
catch
    global steps = 1000
end

noisy=rand(Float32, 2, 1000)

model = trainModel(noisy, steps)

model_state = Flux.state(model)

jldsave("models/model-$(steps).jld2", model_state = model_state)

PlotLoaded.jl(script to load model and plot the data):

using Plots  # to draw the above figure
using FluxNeuron, CUDA, Flux, JLD2

println("Number of steps(default = 1000): ")

#Take single line user input from the user
try
    global steps = parse(Int, readline())
catch
    global steps = 1000
end

println("Number of steps: ", steps)  

noisy=rand(Float32, 2, 1000)

# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(
    Dense(2 => 3, tanh),   # activation function inside layer
    BatchNorm(3),
    Dense(3 => 2)) |> gpu        # move model to GPU, if available


optim = Flux.setup(Flux.Adam(0.01), model)  # will store optimiser momentum, etc.
# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy |> gpu) |> cpu                                 # 2×1000 Matrix{Float32}
probs1 = softmax(out1)      # normalise to get probabilities

model_state = JLD2.load("models/model-$(steps).jld2", "model_state");

println("model_state: ", model_state)

Flux.loadmodel!(model, model_state);

# probs1, probs2, truth = neuralNetwork(noisy, steps)
optim # parameters, momenta and output have all changed
out2 = model(noisy |> gpu) |> cpu  # first row is prob. of true, second row p(false)
probs2 = softmax(out2)      # normalise to get probabilities
mean((probs2[1,:] .> 0.5) .== truth)  # accuracy 94% so far!

p_true = scatter(noisy[1,:], noisy[2,:], zcolor=truth, title="True classification", legend=false)
p_raw =  scatter(noisy[1,:], noisy[2,:], zcolor=probs1[1,:], title="Untrained network", label="", clims=(0,1))
p_done = scatter(noisy[1,:], noisy[2,:], zcolor=probs2[1,:], title="Trained network", legend=false)

plot(p_true, p_raw, p_done, layout=(1,3), size=(1000,330))

On println("model_state: ", model_state) I get this:

model_state: (layers = ((weight = ERROR: LoadError: UndefRefError: access to undefined reference

On model_state = JLD2.load(“models/model-$(steps).jld2”, “model_state”); I get this:

ERROR: LoadError: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/.julia/packages/CUDA/Tl08O/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuCtxPushCurrent_v2
    @ ~/.julia/packages/CUDA/Tl08O/lib/utils/call.jl:34 [inlined]
  [4] push!
    @ ~/.julia/packages/CUDA/Tl08O/lib/cudadrv/context.jl:126 [inlined]
  [5] device(ctx::CuContext)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/lib/cudadrv/context.jl:286
  [6] device
    @ ~/.julia/packages/CUDA/Tl08O/src/array.jl:346 [inlined]
  [7] unsafe_copyto!(dest::CuArray{Float32, 1, CUDA.DeviceMemory}, doffs::Int64, src::CuArray{Float32, 1, CUDA.DeviceMemory}, soffs::Int64, n::Int64)
    @ CUDA ~/.julia/packages/CUDA/Tl08O/src/array.jl:572
  [8] copyto!
    @ ~/.julia/packages/CUDA/Tl08O/src/array.jl:517 [inlined]
  [9] copyto!
    @ ~/.julia/packages/CUDA/Tl08O/src/array.jl:521 [inlined]
 [10] loadleaf!(dst::CuArray{Float32, 1, CUDA.DeviceMemory}, src::CuArray{Float32, 1, CUDA.DeviceMemory})
    @ Flux ~/.julia/packages/Flux/HBF2N/src/loading.jl:22
 [11] loadmodel!(dst::Dense{typeof(tanh), CuArray{…}, CuArray{…}}, src::@NamedTuple{weight::CuArray{…}, bias::CuArray{…}, σ::Tuple{}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux ~/.julia/packages/Flux/HBF2N/src/loading.jl:103
 [12] loadmodel!(dst::Tuple{Dense{…}, BatchNorm{…}, Dense{…}}, src::Tuple{@NamedTuple{…}, @NamedTuple{…}, @NamedTuple{…}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux ~/.julia/packages/Flux/HBF2N/src/loading.jl:105
 [13] loadmodel!(dst::Chain{Tuple{Dense{…}, BatchNorm{…}, Dense{…}}}, src::@NamedTuple{layers::Tuple{@NamedTuple{…}, @NamedTuple{…}, @NamedTuple{…}}}; filter::Function, cache::Base.IdSet{Any})
    @ Flux ~/.julia/packages/Flux/HBF2N/src/loading.jl:105
 [14] loadmodel!(dst::Chain{Tuple{Dense{…}, BatchNorm{…}, Dense{…}}}, src::@NamedTuple{layers::Tuple{@NamedTuple{…}, @NamedTuple{…}, @NamedTuple{…}}})
    @ Flux ~/.julia/packages/Flux/HBF2N/src/loading.jl:90
 [15] top-level scope
    @ ~/work/julia/FluxNeuron/src/PlotLoaded.jl:29
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:489
 [17] top-level scope
    @ REPL[2]:1
in expression starting at /home/wiktor/work/julia/FluxNeuron/src/PlotLoaded.jl:29
Some type information was truncated. Use `show(err)` to see complete types.

So probably problem could occur on saving model, or maybe I need to load this different way?

Before storing the model, try converting it to a CPU model.

More info:
Gpu models seem to wrap Cuda arrays.
The data of cuda arrays lives on the gpu and the Julia object just has a reference. JLD2 does not know that you would ve interested in storing the data instead of the object (e.g. reference ) it was given.
The reference is meaningless after loading in a new session.

1 Like

Yes, I have also found out on docs, how to do this: GPU Support · Flux
Thanks