Just copying over my question from the GitHub issues to hopefully get more visibility here.
In Flux.jl, I’d like to use a separate set of optimisation parameters for each hidden layer in a single Chain. For example, if I had a pre-trained network I wanted to append layers to, I’d want to make the learning rate for those layers much less than the learning rate for my new layers. Maybe I’d want the momentum, dropout, etc. to be different as well.
Looking at the source code, using multiple Optimisers during training may have once been possible, but nothing elsewhere in the documentation (from what I can tell) indicates that it is now. Am I missing something obvious? Thanks in advance!
Edit: Using Julia 1.0.3 and Flux 0.7.3
Update: this should work just fine for my purposes.
distribute(opts, m) = collect(Iterators.flatten([repeat([opt], length(params(l))) for (l, opt) in zip(m, opts)]))
function train!(loss, ps, data, opts::Array; cb = () -> ())
cb = runall(cb)
ps = Params(ps)
for d in data
gs = gradient(ps) do
for (p, opt) in zip(ps, opts)
update!(opt, p, gs[p])
if cb() == :stop
depwarn("Use of `:stop` is deprecated; use `Flux.stop()` instead", :stop)
if ex isa StopException
m = Chain(Dense(2,4,tanh), Dense(2,4,tanh))
opts = distribute([Descent(0.3), Descent(0.1)], m) # [Descent(0.3), Descent(0.3), Descent(0.1), Descent(0.1)]
@epochs 500 train!(loss, params(m), data, opts)