Flux.jl: Different set of optimisation parameters per layer

rederekt · March 29, 2019, 9:56pm

Hi,

Just copying over my question from the GitHub issues to hopefully get more visibility here.

In Flux.jl, I’d like to use a separate set of optimisation parameters for each hidden layer in a single Chain. For example, if I had a pre-trained network I wanted to append layers to, I’d want to make the learning rate for those layers much less than the learning rate for my new layers. Maybe I’d want the momentum, dropout, etc. to be different as well.

Looking at the source code, using multiple Optimisers during training may have once been possible, but nothing elsewhere in the documentation (from what I can tell) indicates that it is now. Am I missing something obvious? Thanks in advance!

Edit: Using Julia 1.0.3 and Flux 0.7.3

rederekt · March 30, 2019, 8:59pm

Update: this should work just fine for my purposes.

distribute(opts, m) = collect(Iterators.flatten([repeat([opt], length(params(l))) for (l, opt) in zip(m, opts)]))

function train!(loss, ps, data, opts::Array; cb = () -> ())
  cb = runall(cb)
	ps = Params(ps)
  for d in data
    try
      gs = gradient(ps) do
        loss(d...)
      end
      for (p, opt) in zip(ps, opts) 
        update!(opt, p, gs[p])
      end
      if cb() == :stop
        depwarn("Use of `:stop` is deprecated; use `Flux.stop()` instead", :stop)
        break
      end
    catch ex
      if ex isa StopException
        break
      else
        rethrow(ex)
      end
    end
  end
end

m = Chain(Dense(2,4,tanh), Dense(2,4,tanh))

opts = distribute([Descent(0.3), Descent(0.1)], m) # [Descent(0.3), Descent(0.3), Descent(0.1), Descent(0.1)]

@epochs 500 train!(loss, params(m), data, opts)

Topic		Replies	Views
Training layers of a Flux model separately Machine Learning question	1	380	November 13, 2021
How to update learning rate during Flux training in a better manner? New to Julia flux	7	2399	December 23, 2023
How to update two NN with different optimisers and common gradient General Usage flux	5	545	February 4, 2022
Flux: How to create a custom multi-layer model with some parameters shared across layers? Machine Learning question	2	730	July 7, 2021
Stacking layers example Flux - Flux.params empty New to Julia question	2	833	December 12, 2019

Flux.jl: Different set of optimisation parameters per layer

Related topics