Is it possible to use a custom constrained optimizer or to modify the gradients to only allow for positive weights and biases?
I want to say this has been discussed before on Discourse, but not being able to find a thread Iād suggest looking at Feature request: Modifying Dense Layer to accommodate kernel/bias constraints and kernel/bias regularisation Ā· Issue #1389 Ā· FluxML/Flux.jl Ā· GitHub. Note that we are moving away from this āimplicitā Params
model into storing both model weights and gradients in proper structures. If youād like something more futureproof, have a look at Home Ā· Optimisers.jl.
Thanks. I tried the regularization path but I couldnāt make it work, I am looking into the custom optimizer now
An alternative approach is to introduce an activation function for the weights. That way the weights āas appliedā are non-negative, but they posses a real valued learnable ālatent stateā.
using Flux
m = Dense(3, 2, relu)
x = rand(Float32, 3, 5)
(a::Dense)(x::AbstractVecOrMat, g) = a.Ļ.(g.(a.weight)*x .+ g.(a.bias))
y1 = m(x) # normal forward pass
y2 = m(x, relu) # forward pass with non-negative weights
Oooh that is a good idea, now maybe stupid questionā¦
I donāt see any problem using a normal gradient descent, but do you think it would create some problemsusing a method requiring an Hessian (stochastic LBFGS) for training since it depends on the information on the previous gradient? (that was the main reason why I was looking if it was possible to have a constrained optimizer
I donāt think it should be an issue, but I might be missing something.
But in general nonnegative weights make a network a lot less expressive. You are limiting the weights to a single orthant. I think both approaches have drawbacks
-
Optimizer approach: How to efficiently project to the feasible domain.
-
weight activation approach: How to avoid dying or saturated synapses (ReLU and Sigmoid case respectively).