Constrain weights and biases to be positive

marco_menarini · January 23, 2023, 7:37am

Is it possible to use a custom constrained optimizer or to modify the gradients to only allow for positive weights and biases?

ToucheSir · January 24, 2023, 8:51pm

I want to say this has been discussed before on Discourse, but not being able to find a thread I’d suggest looking at Feature request: Modifying Dense Layer to accommodate kernel/bias constraints and kernel/bias regularisation · Issue #1389 · FluxML/Flux.jl · GitHub. Note that we are moving away from this “implicit” Params model into storing both model weights and gradients in proper structures. If you’d like something more futureproof, have a look at Home · Optimisers.jl.

marco_menarini · January 24, 2023, 9:31pm

Thanks. I tried the regularization path but I couldn’t make it work, I am looking into the custom optimizer now

Rasmus_Hoier · January 27, 2023, 1:04pm

An alternative approach is to introduce an activation function for the weights. That way the weights “as applied” are non-negative, but they posses a real valued learnable “latent state”.

using Flux
       
m = Dense(3, 2, relu)
x = rand(Float32, 3, 5)
(a::Dense)(x::AbstractVecOrMat, g) = a.σ.(g.(a.weight)*x .+ g.(a.bias))

y1 = m(x) # normal forward pass
y2 = m(x, relu) # forward pass with non-negative weights

marco_menarini · January 27, 2023, 6:29pm

Oooh that is a good idea, now maybe stupid question…

I don’t see any problem using a normal gradient descent, but do you think it would create some problemsusing a method requiring an Hessian (stochastic LBFGS) for training since it depends on the information on the previous gradient? (that was the main reason why I was looking if it was possible to have a constrained optimizer

Rasmus_Hoier · January 30, 2023, 8:38am

I don’t think it should be an issue, but I might be missing something.

But in general nonnegative weights make a network a lot less expressive. You are limiting the weights to a single orthant. I think both approaches have drawbacks

Optimizer approach: How to efficiently project to the feasible domain.
weight activation approach: How to avoid dying or saturated synapses (ReLU and Sigmoid case respectively).

Topic		Replies	Views
Take positive part of weights in loss function Machine Learning flux	3	570	June 29, 2023
Is there a way to impose a constraint on the output of a Flux neural network model? Machine Learning question , flux , machine-learning , constants	4	392	June 15, 2023
How to force flux parameters to be within a certain compact set? (projection) General Usage flux	1	289	December 29, 2020
Weights are not updated Machine Learning machine-learning	8	652	March 25, 2021
Flux: Custom Layer New to Julia flux	3	2631	June 25, 2020

Constrain weights and biases to be positive

Related topics