ANN: Bayesian Linear Regression

I’m happy to announce the release of BayesianLinearRegression.jl. Initial work on this package began in 2015, but there was some significant bitrot. Thanks to RelationalAI for supporting this development!

I’ve started registration for this package, but it’s currently in the waiting period. It should be registered within the next few days.

If you’re new to Bayesian modeling, you can think of this as ridge regression with automatically-tuned regularization.

For all the details, see Chapter 3 of Pattern Recognition and Machine Learning by Chris Bishop.

Say we have something like

julia> X = randn(100,10);

julia> β = randn(5);

julia> y = X[:,1:5] * β + randn(100);

Then we can say

julia> m = BayesianLinReg(X,y);

julia> fit!(m)
BayesianLinReg model

Log evidence: -156.14
Prior scale:  0.57
Noise scale:  0.97

Coefficients:
  1:  1.54 ± 0.09
  2:  0.46 ± 0.10
  3:  0.55 ± 0.09
  4:  0.16 ± 0.11
  5: -0.43 ± 0.10
  6:  0.03 ± 0.10
  7:  0.03 ± 0.10
  8:  0.18 ± 0.11
  9: -0.07 ± 0.11
 10:  0.04 ± 0.10

This shows a few things:

  1. The log evidence, which can be used for model selection,
  2. The prior scale, a “reasonable” prior for each weight,
  3. The noise scale, which you can think of as \hat{\sigma}, and
  4. The coefficients, or posterior weights, with standard errors.

Note that thanks to the clever tricks of Measurements.jl, the posterior weights include correlation information:

julia> w = posteriorWeights(m);

julia> w[1].der
Measurements.Derivatives{Float64} with 10 entries:
  (0.0, 1.0, 0x000000000000000a) => -0.00411411
  (0.0, 1.0, 0x0000000000000008) => 0.0106236
  (0.0, 1.0, 0x0000000000000006) => -0.00479298
  (0.0, 1.0, 0x0000000000000004) => -0.00320726
  (0.0, 1.0, 0x0000000000000002) => 0.0920992
  (0.0, 1.0, 0x0000000000000003) => 0.00134676
  (0.0, 1.0, 0x0000000000000005) => -0.00552799
  (0.0, 1.0, 0x0000000000000007) => -0.00393502
  (0.0, 1.0, 0x0000000000000009) => 0.000518892
  (0.0, 1.0, 0x000000000000000b) => -0.00409676

We can also change the active weights (any inactive are set to zero):

julia> m.active = [1:5;];

julia> update!(m)
BayesianLinReg model

Log evidence: -149.04
Prior scale:  1.00
Noise scale:  1.00

Coefficients:
  1:  1.56 ± 0.10
  2:  0.48 ± 0.10
  3:  0.54 ± 0.09
  4:  0.16 ± 0.10
  5: -0.42 ± 0.11


julia> fit!(m)
BayesianLinReg model

Log evidence: -148.69
Prior scale:  0.80
Noise scale:  0.96

Coefficients:
  1:  1.56 ± 0.09
  2:  0.48 ± 0.10
  3:  0.54 ± 0.08
  4:  0.16 ± 0.10
  5: -0.42 ± 0.10

So you could, for example, build some variable selection methods around this.

For prediction, we have

julia> predict(m,X[1:10,m.active])
10-element Array{Measurements.Measurement{Float64},1}:
 -0.62 ± 0.99
 -2.18 ± 0.97
  1.21 ± 0.99
  0.33 ± 0.99
 -2.58 ± 0.99
 -1.94 ± 0.98
   1.4 ± 1.0
 -0.05 ± 0.97
 -3.51 ± 0.99
 -0.97 ± 0.98

You can also do this without observation noise:

julia> predict(m,X[1:10,m.active]; noise=false)
10-element Array{Measurements.Measurement{Float64},1}:
 -0.62 ± 0.22
 -2.18 ± 0.14
  1.21 ± 0.24
  0.33 ± 0.24
 -2.58 ± 0.23
 -1.94 ± 0.2
  1.36 ± 0.29
 -0.05 ± 0.14
 -3.51 ± 0.24
 -0.97 ± 0.17

Or ignore uncertainty altogether:

julia> predict(m,X[1:10,m.active]; uncertainty=false)
10-element Array{Float64,1}:
 -0.6198367846446444
 -2.1822748489031185
  1.2097200082593464
  0.3324144009375185
 -2.5818035785619786
 -1.9366523399339064
  1.3611242250751907
 -0.0458769531369336
 -3.5130276372775886
 -0.9710314161681255

Finally, when you fit!, there’s an option to pass a callback. This is expected to be a function taking a BayesianLinReg as its argument. One useful thing a callback can do is set m.done = true if some convergence criteria are met. The default is callback = stopAtIteration(10), which does what it says on the tin :slight_smile:

31 Likes

Noob question here. How does this package play with Flux.jl? Can it be used to implement neural network training like Matlab’s trainbr.m?

Just getting into Flux and understanding how it plays with all of the other packages.

1 Like

Hi, this package is specific to linear regression. My Soss.jl package will address this, but I haven’t tested it with Flux. For now I think for now I’d recommend Turing for this:
https://turing.ml/dev/tutorials/3-bayesnn/

1 Like