How to embed expert knowledge /constraints in ML training?

sylvaticus · May 15, 2025, 9:21am

I want to train a model to predict the timber volume growth in a forest as a function of pedological/site variables and climatic data.

So, I have something like this:

ip_point	X	Y	vol	vol_growth	sand_pc	soil_depth	precipitation	temp
1	92.1	12.3	100	10	0.2	50	200	15
2	91.4	15.2	121	9	0.5	60	150	16
3	90.2	14.1	110	8	0.3	55	180	14
4	89.8	13.5	95	7	0.4	52	170	15

Of course, I could “simply” train to predict vol_growth as a function of the other variables (including growth), but I want to bound the growth to follow a logistic function, so to follow the equation (in the growth/state space):

\dot V = aV - (a/b) V^2

And I want to predict the 2 parameters a and b, but of course on each point I have a single observation, so my question, how to train a ML model but being sure that in relation to the vol variable I am following a logistic function? Put a loss ? Fitting a unrestricted model and then fit it (e.g. LsqFit) to the logistic model?

baggepinnen · May 15, 2025, 11:14am

Is vol_growth here a measurement of \dot V?

sylvaticus · May 15, 2025, 11:47am

yes, it is \dot V (in my real dataset, I have V in two observations 5 years apart)

baggepinnen · May 15, 2025, 11:58am

In that case, standard linear least squares would be applicable:

A = [V V.^2]
y = V_dot
x = A\y
a = x[1]
b = -a/x[2]

This minimizes
\sum \big(\dot V - (aV - a/b V^2)\big)^2

This implicitly makes the assumption that there are only measurement errors present in \dot V, if you expect significant errors also in the measurement of V, you may want to consider an errors-in-variables approach.

sylvaticus · May 15, 2025, 12:07pm

I am not sure I understood, but this fit \dot V to the function, but I can’t see the relation to the other variables.. my idea is that the growth is influenced by the other variables (soil, climate, …)

baggepinnen · May 15, 2025, 12:14pm

Ah I see, in that case, you could make a and b functions of these predictors, a = a(p_c, d, p) or something like that.

The problem feels a bit under specified though, considering that you have no evolution of time in your dataset. I could imagine this approach would work well if you had several measurements from each site, but with data like this I’m not sure.

slwu89 · May 15, 2025, 2:33pm

I don’t know what kind of “ML” you are interested in but constrained splines might help you. Constrained P-Splines have nice properties and have been around for awhile (and can handle monotonic, etc constraints). You can check out this book on P-Splines which has a large amount of nicely written R code which is extremely simple to translate to Julia: https://psplines.bitbucket.io/

Simon Wood (of mgcv fame) has a paper on embedding shape constrained splines in generalized additive models here: Shape constrained additive models | Statistics and Computing with an accompanying R package.

mbauman · May 15, 2025, 3:56pm

In my experience (which is admittedly getting pretty ancient at this point), the best way to embed expert knowledge into simple (and frequently small) tabular datasets like these is through feature engineering. That’s exactly what @baggepinnen is doing by regressing against both V and V.^2 — instead of needing to train a complicated non-linear model that such a nonlinearity is important, you just use a linear model with the non-linearity you expect to be relevant as an input.

But time is indeed a major missing variable. You typically want to predict growth in the next X months, and in that case you will also want to feed into your model windowed historic data of, e.g., precipitation totals over the past month, over the past three months, over the past year, etc. Again, those kinds of engineered features are things that can be hard for a complicated model to “learn” from the basic dataset, but once you’ve engineered them, simple models like decision trees can give great performance.

sylvaticus · May 15, 2025, 5:20pm

I am attempting to “solve” the issue (for now on dummy data) by creating a multi-branch neural network, where one branch predicts a and b based on soil, precipitation (at time t, t-1, etc) and an other branch carry on with the volumes across the network. Finally, a terminal layer embeds the structure that I want to maintain (logistic in relation to the current volume) working on a function of (a, b and v0)
This avoids me to estimate separately a and b that are unobserved.

This is the code on dummy data:

using BetaML, Plots

# Dummy data creation....
N = 4000
x = rand(1:0.001:2,N,5)
e = rand(N)/1000
a = @. x[:,1]^2 - 2 * x[:,2]^2 + 3 * x[:,3]^2 - 4 * x[:,4]^1 + 2 * x[:,5] + e + 10
b = @. (2 *x[:,1]^2 + 4 * x[:,2]^2 + 5 * x[:,3] + 4 * x[:,4]^1 + 2 * x[:,5] + e *10) *100
v0 = rand(17:0.001:22,N)
xv = hcat(x,v0)
dv = @. a * v0 - (a/b) * v0^2


# Method 1: direct estimation of dv...
layers = [DenseLayer(6,10,f=relu),
          DenseLayer(10,10,f=relu),
          DenseLayer(10,1,f=identity)]
m = NeuralNetworkEstimator(epochs=200, batch_size=4, layers=layers)
dv_est = fit!(m, xv, dv)

hcat(dv,dv_est)
mse(dv,dv_est) # 51.61

# Method 2: Structure estimation...
# I create a NN with a branch for the parameters a and b, and a separate branch for the volume, and then  Icreate a (weightless) function layer that computes the growth based on the two parameters and the current volume.
dvcomp(x) = [x[1] * x[3] - (x[1] / x[2]) * x[3]^2] # x is [a ,b, V] - output must be a vector
l1_ab  = DenseLayer(5,10,f=relu)
l1_v   = ReplicatorLayer(1)
l1     = GroupedLayer([l1_ab,l1_v])
l2_ab  = DenseLayer(10,10,f=relu)
l2_v   = ReplicatorLayer(1)
l2     = GroupedLayer([l2_ab,l2_v])
l3_ab  = DenseLayer(10,2,f=identity)
l3_v   = ReplicatorLayer(1)
l3     = GroupedLayer([l3_ab,l3_v])
l4     = VectorFunctionLayer(3,f=dvcomp)

layers=[l1,l2,l3,l4]
m = NeuralNetworkEstimator(epochs=200, batch_size=4, layers=layers)
dv_est = fit!(m, xv, dv)

hcat(dv,dv_est)
mse(dv,dv_est) # 5.23
plot(dv,dv_est, label=nothing,xlabel="obs",ylabel="est")

The predictions (on dummy data) seems very good:

However, I may be doing something very dummy, as the model seems not respected.. going to look on this..

mbauman · May 15, 2025, 5:38pm

I mean, it looks like you’re training models with hundreds of parameters… and then testing against the training data? With 4000 data points? That’s a lot of parameters for not much data; surely there’s a goodly amount of overfitting going on.

slwu89 · May 15, 2025, 6:06pm

Time for splines?

sylvaticus · May 16, 2025, 2:34pm

Sure, but I wasn’t considering that in this first attempt.. in my application I have hundred of thousand of data, and I’ll train-test split and cross-validate to control for overfitting…

The only problem I have now is that when I use the “structured” version the network is very hard to train, in the meaning that half of the attempts (depending on initial values of the weights), with the standard gradient-based optimers (SGD or ADAM), the loss get stuck in some local minima..

Revised code

using Pkg
Pkg.activate(@__DIR__)
using Revise

using Random, Pipe, BetaML, Plots

Random.seed!(200) #100

# Data creation....
N = 5000
x = rand(1.5:0.001:2,N,5)
e = rand(N)/1000
a = @. (x[:,1]^2 - 2 * x[:,2]^2 + 3 * x[:,3]^2 - 4 * x[:,4]^1 + 2 * x[:,5] + e + 10) / 100
b = @. (2 *x[:,1]^2 + 4 * x[:,2]^2 + 5 * x[:,3] + 4 * x[:,4]^1 + 2 * x[:,5] + e + 10) * 10
v0 = rand(20:0.001:minimum(b),N)
xv = hcat(x,v0)
xvvsq = hcat(x, v0, v0.^2)
dv = @. a * v0 - (a/b) * v0^2
scatter(v0, dv)

# Method 1: direct estimation of dv...
layers = [DenseLayer(6,10,f=relu),
          DenseLayer(10,10,f=relu),
          DenseLayer(10,1,f=relu)]
m = NeuralNetworkEstimator(epochs=100, batch_size=4, layers=layers)
dv_est = fit!(m, fit!(Scaler(),xv), dv)

hcat(dv,dv_est)
mse(dv,dv_est) # 51.61
scatter(dv,dv_est, label=nothing,xlabel="obs",ylabel="est")

# Method 2: Structure estimation...
# I create a NN with a branch for the parameters a and b, and a separate branch for the volume, and then  Icreate a (weightless) function layer that computes the growth based on the two parameters and the current volume.
dvcomp(x) = [x[1] * x[3] - (0.001*x[2]) * x[3]^2] # x is [a ,b, V] - output must be a vector
l1_ab  = DenseLayer(5,5,f=relu)
l1_v   = ReplicatorLayer(1)
l1     = GroupedLayer([l1_ab,l1_v])
l2_ab  = DenseLayer(5,5,f=relu)
l2_v   = ReplicatorLayer(1)
l2     = GroupedLayer([l2_ab,l2_v])
l3_ab  = DenseLayer(5,2,f=relu)
l3_v   = ReplicatorLayer(1)
l3     = GroupedLayer([l3_ab,l3_v])
l4     = VectorFunctionLayer(3,f=dvcomp)

layers=[l1,l2,l3,l4]
m = NeuralNetworkEstimator(epochs=100, batch_size=8, layers=layers)
dv_est = fit!(m, xv, dv)

hcat(dv,dv_est)
mse(dv,dv_est) # 5.23

scatter(dv,dv_est, label=nothing,xlabel="obs",ylabel="est")

function getabv(m,x,i)
    i = 10
    r = xv[i,:]
    comp_layers = parameters(m).nnstruct.layers
    xi_last = @pipe forward(comp_layers[1],r)|> forward(comp_layers[2],_) |> forward(comp_layers[3],_) 
    a = xi_last[1]
    b = (a/xi_last[2])/0.001
    v = xi_last[3] 
    return (a,b,v)
end

# Checking logistic model is enforced...
v = 0:500
j = rand(1:N)
xv_check = transpose(hcat([vcat(x[j,:],v[i]) for i in 1:length(v)]...))
(ae,be,vj) = getabv(m,xv,i)
(a[i],b[i])
dv_est = predict(m, xv_check)
plot(v,dv_est, label=nothing, xlabel="vol", ylabel="dvol", title="NN growth predictions on a single plot") # This should give me a bell shape curve

The neural network predictions on a single pixel are now following the logistic growth:

Topic		Replies	Views
How can I use a linear model with NaN parameters due to missing train data? Modelling & Simulations statistics , flux , mlj , glm , linear-regression	0	403	July 15, 2023
Unable to fit my ML model New to Julia	6	325	July 1, 2021
How to pass a function in a variable inside a linear regression? New to Julia regression , glm	2	624	July 28, 2020
Help with Logit NLP problem Optimization (Mathematical)	1	1290	January 8, 2019
Need help with Variational Inference Probabilistic programming turing	0	706	June 5, 2020

How to embed expert knowledge /constraints in ML training?

Related topics