Flux: Resample model with different initializations


I want to do an experiment in Flux where I want to keep the architecture of the model same and want to initialize the parameters with different variance normal distributions. My current set up looks like this

function build_model(var)
    init_fn(args...) = sqrt(var)  * randn(args...)

    return Chain(
        Conv(...usual params..., init = init_fn),
        # Simillar layers
        Dense(...usual params..., init = init_fn)

And whenever I want a model initialized to certain variance, I do

model = build_model(0.01); # model with variance = 0.01
# do some experiment
model = build_model(1.0); # model with variance = 1.0

My understanding is that here I am building a new network evertime I need a newly sampled model.

Is there a way where I can reuse an existing model and reinitialize the model parameters with a different set of values?

Thanks ahead,

You can access the model weights directly and re-init them in-place. It’ll take some manual work, but a couple of helper functions would alleviate that.

Hi @ToucheSir ,

Could please provide me any pointers/examples on how to do it?

I tried to do the following with Dense layer, but is getting error!

julia> m.layers[3].weight |> size
(4, 10)

> m.layers[3].weight = rand(4, 10)
ERROR: setfield! immutable struct of type Dense cannot be changed
 [1] setproperty!(x::Dense{typeof(relu), Matrix{Float32}, Vector{Float32}}, f::Symbol, v::Matrix{Float64})
   @ Base ./Base.jl:34
 [2] top-level scope
   @ REPL[11]:1

You’re trying to replace the entire parameter field of the layer (which is an immutable struct) instead of updating an array in-place. Think m.layers[2].setWeight(rand(...)) in another language. Since the objective appears to be avoiding unnecessary allocations, you’d want to update the contents of the array in place instead:

m.layers[3].weight .= rand(4, 10)

(note the dot)

Hi @ToucheSir, thanks for the pointers. I tried to follow the solution you proposed and developed a resample_model (Method 2 below) function for the same.

I tried to benchmark both to see which one is better. I was under the assumption that in-place update of parameters should do better. Below are the necessary snippets.

# Function to build model                                                                                                                                            
function LeNet5(; initializer=Flux.glorot_uniform, act_fn=relu, im_size=(28, 28, 1), n_classes=10)                                                                   
     out_conv_size = (im_size[1] ÷ 4 - 3, im_size[2] ÷ 4 -3, 16)                                                                                                      
    return Chain(                                                                                                                                                    
        Conv((5, 5), im_size[end] => 6, act_fn, init=initializer),
        MaxPool((2, 2)),
        Conv((5, 5), 6 => 16, act_fn, init=initializer),
        MaxPool((2, 2)),
        Dense(prod(out_conv_size), 120, act_fn, init=initializer),
        Dense(120, 84, act_fn, init=initializer),
        Dense(84, n_classes, init=initializer)

Method 1: Call build function everytime

function test_reinit_1()
    model = LeNet5()

    for i in 1:N
        model = LeNet5()


Method 2: Inplace parameter update

function resample_model!(model)
    for layer ∈ model
        if hasproperty(layer, :weight)
            layer.weight .= randn(size(layer.weight))
        if hasproperty(layer, :bias)
            layer.bias .= randn(size(layer.bias))

function test_reinit_2()
    model = LeNet5()

    for i in 1:N


Below is the benchmark results for 'N = 100`

julia> @benchmark test_reinit_1()
  memory estimate:  34.66 MiB
  allocs estimate:  12322
  minimum time:     6.986 ms (0.00% GC)
  median time:      9.145 ms (14.77% GC)
  mean time:        9.143 ms (12.06% GC)
  maximum time:     12.402 ms (12.52% GC)
  samples:          546
  evals/sample:     1
julia> @benchmark test_reinit_2()
  memory estimate:  34.86 MiB
  allocs estimate:  9522
  minimum time:     16.408 ms (0.00% GC)
  median time:      23.207 ms (6.69% GC)
  mean time:        22.937 ms (7.06% GC)
  maximum time:     29.082 ms (9.05% GC)
  samples:          218
  evals/sample:     1

It seems like method 1 (calling model building function over and over again) is better than my current implementation of method 2. This looks odd as I was expecting opposite results.

Any suggestions how I can speed up method2 here?

I thought there was an in-place version of randn, but perhaps I was mistaken. Given the memory estimate is similar, it may be easier for you to just go with method 1.

Thanks for the comments. Maybe method 1 is the best option so far.