Learning Julia by an example of a simple Bayesian linear regression

Boris · June 3, 2023, 2:13pm

Hello everyone,

long-time lurker, very occasional poster. I decided to give Julia a try once more with 1.9. So far I have struggled with the language on and off (I have touched on this previously) but would really like to eventually get the hang of it. I feel that my biggest obstacles are that it’s hard for me to find the best way to write julian code in Julia

To get better, I have written a small piece of code that is an example encompassing most of the code that I have to work with on a regular basis. The original code from which I wrote this one is a teaching code - focusing more on clarity than speed. I still think it is a good exercise. Also, the Julia code takes 30% of the time of the original code, which is in Matlab, so that is inspiring.

My aim with this topic is to get some general guidance how can I make this code the best it can be and general feedback of what I should watch out for and how I can go about it.

Example summary

The structure of the code is Bayesian linear regression

Y = X \beta + \varepsilon \qquad \varepsilon \sim N(0,\Sigma)

where we have to estimate the parameters \beta and \Sigma

Thus, the code has the following steps:

Import data
Create Y and X
Setup a big loop where using conditional distributions:
1. Draw \beta and save it
2. Draw \Sigma and save it
Do some summaries, plots, etc.

This is pretty much it. I might have many more steps in complex codes usually but the jist is always the same.

The code structure

I have a module, named NewB, where I keep the 4 functions for my example: mlag, genBeta, genSigma, gibbs that perform the aforementioned steps 2 and 3.
The main code, in file MainNewB.jl reads the data and defines some preliminary stuff needed for the main module.

I have put the project on github (new to this as well) and replaced the original data with random numbers. Of your fork it and you prefer the data, uncomment the lines.

I am attaching the code here.

Using @time macro gives me 0.07 second on my machine with 7000 draws total (vs 0.2 in Matlab so yay!).

Specific questions

gibbs is the main function. Is saving the \beta and \Sigma optimal? Typically I have many more parameters to save and the vectors and matrices are much larger

beta_dist = zeros(5,n_gibbs-burn)
if i > burn
            beta_dist[:,i-burn] = beta_d
end

Should I preallocate memory, as in Matlab for that? The @time macro shows a lot of allocations, especially if I go to 70 000 replications. I know I want to save many matrices but does this point to a problem?

julia> @time include("MainNewB.jl")
  0.082845 seconds (171.56 k allocations: 68.732 MiB, 18.70% gc time)

I am not sure I understand the output of @code_warntype for the gibbs function. Highlights union of nothing and Tuple at some point. Is this something to look out for?

Locals
  @_11::Union{Nothing, Tuple{Int64, Int64}}

If I have one-liners for specific tasks (such as drawing sigma), is putting it in a function still a good idea, is there overhead associated with it? If I understand the performance tip function barriers this is always a good idea.
Is there a way for module NewB to inherit using LinearAlgebra or is this not a good idea? What happens if I have a module needing another module and vice versa in the same time?

Code main file:

using Revise
using LinearAlgebra
using Distributions
using DelimitedFiles
using BenchmarkTools
using NewB

# fdata = readdlm("gdp4795.txt")
# Z = 100*fdata[21:end,:]./fdata[20:end-1,:].-100
# plot(Z)
Z = rand(150,1)

n_gibbs = 7000
burn    = 2000
p = 4;

(X,Y) = mlag(Z,p)

Sigma0   = I(5)*4000
BETA0   = zeros(5,1) 
sig2_d    = 0.5        
nu0 =0
d0  =0

beta_d = genBeta(X,Y,BETA0,Sigma0,sig2_d)
sig2_d = genSigma(Y,X,beta_d,nu0,d0)
(beta_dist, sigma_dist) = gibbs(Y,X,BETA0,Sigma0,sig2_d,d0,nu0,n_gibbs,burn)

mean(beta_dist,dims=2)
mean(sigma_dist,dims=2)

Module NewB.jl:

module NewB
using LinearAlgebra
using Distributions
export mlag, genBeta, genSigma, gibbs

"""
    mlag(Yfull::Matrix{Float64},p::Integer)
    Creates lags of a matrix for a VAR representation with a constant on the left
"""
function mlag(Yfull::Matrix{Float64},p::Integer)
    (Tf, n) = size(Yfull)
    X = ones(Tf-p,1)
    for i = 1:p
        X = [X Yfull[p-i+1:end-i,:]]
    end
    Y = Yfull[p+1:end,:]
    return X, Y             # this changes the array passed into the function
end


function genBeta(X,Y,Beta_prior,Sigma_prior,sig2_d)
    invSig = Sigma_prior^-1
    V = (invSig + sig2_d^(-1)*(X'*X))^-1
    C = cholesky(Hermitian(V)) 
    Beta1 =  V*(invSig*Beta_prior + sig2_d^(-1)*X'*Y)
    beta_d = Beta1 + C.L*randn(5,1)
    return beta_d
end

function genSigma(Y,X,beta_d,nu0,d0)
    nu1 = size(Y,1)+nu0
    d1  = d0 + only((Y-X*beta_d)'*(Y-X*beta_d)) 
    sig2_inv = rand(Gamma(nu1/2,2/d1),1)
    sig2_d = 1/only(sig2_inv)
    return sig2_d
end

function gibbs(Y,X,BETA0,Sigma0,sig2_d,d0,nu0,n_gibbs,burn)
    beta_dist = zeros(5,n_gibbs-burn)
    sigma_dist = zeros(1,n_gibbs-burn)
    for i = 1:n_gibbs
        beta_d = genBeta(X,Y,BETA0,Sigma0,sig2_d)
        sig2_d = genSigma(Y,X,beta_d,nu0,d0)
        if i > burn
            beta_dist[:,i-burn] = beta_d
            sigma_dist[1,i-burn] = sig2_d
        end
    end
    return beta_dist, sigma_dist
end

end # module NewB

If you have gotten this far, thank you very much!

dmbates · June 3, 2023, 3:33pm

Because genBeta and genSigma are called for every iteration you probably want to benchmark them carefully and profile both execution time and memory allocations in them. In genSigma, for example, you evaluate the residual, Y - X * beta_d, twice and create a matrix product that will be 1 by 1 then extract the scalar value. That could be replaced by

    resid = Y - X * beta_d
    d1 = d0 + dot(resid, resid)

or

    d1 = d0 + sum(abs2, resid)

Similarly, sig2_d can be written

   sig2_d = inv(rand(Gamma(nu1 / 2, 2 / d1)))

However, I suspect that the time is being spent in genBeta and there the construction of V is likely the bottleneck, as that one line violates the first two of Nick Higham’s seven sins of numerical linear algebra. The fact that a formula is written in terms of a inverse of a matrix constructed from the inverse of an X'X-like construction doesn’t mean it is a good idea to try to calculate it that way.

It is likely that the triangular factor C.L can be determined from a QR factorization of X directly.

In each call to genBeta you are creating the return vector, which is then copied into a column of beta_dist in the gibbs function. A common Julia idiom is to define a mutating version of a function, say genBeta!, which takes as an argument beta_d and overwrites its contents, which are then returned. That way you can reuse a working vector declared in the gibbs function.

Boris · June 4, 2023, 7:54am

Oh, those or some nice tips, thank you. I will incorporate them and report here my journey.

This tip sound invaluable, I doubt I would have found it out in a document somewhere (or payed attention). The thing is that my lord codes are just versions of that one with more things going on so such best practices are exactly what I am looking for.

Boris · June 4, 2023, 2:20pm

Thank you, I had the opportunity to try out some of the suggestions.

For example, the resid = Y - X * beta_d; trick reduced the allocations of sigma from 7 to 44, I would watch out for such points in the future.

mine
julia> @time genSigma(Y,X,beta_d,nu0,d0)
  0.000013 seconds (7 allocations: 5.016 KiB)
0.10240049561807199

dot trick:
julia> @time genSigma(Y,X,beta_d,nu0,d0)
  0.000016 seconds (4 allocations: 2.516 KiB)
0.08709325471913029

and simply that reduced the total code by almost a half (and allocations by a lot)!

julia> @time include("MainNewB.jl")
  0.048585 seconds (150.56 k allocations: 51.639 MiB, 9.80% gc time)
vs
julia> @time include("MainNewB.jl")
  0.074861 seconds (171.56 k allocations: 68.732 MiB, 11.86% gc time)

I am at that amount of seconds that I should start using BenchmarkTools

I am unsure how to incorporate the "!" suggestion (just changing the values of beta_d instead of re-allocating it. I tried following the advice in this discussion, more specifically to use x=f(x) as in this post

I tried pre-defining beta_d in gibbs such that

function gibbs(Y,X,BETA0,Sigma0,sig2_d,d0,nu0,n_gibbs,burn)
    beta_d = zeros(5,1)
    ...
    for i = 1:n_gibbs
        beta_d = genBeta(X,Y,BETA0,Sigma0,sig2_d,beta_d)
        ...
    end

and then in genBeta assigning to beta_d’s elements but that doesn’t seem to be it.

beta_d[:,1] = Beta1 + C.L*randn(5,1)

alternatively I tried beta_d .=.

What is the correct way to go about this? Just a reference to the manual can also help, I couldn’t find easily functions with the "!" operator. I looked at the broadcast part and I thought I can follow but obviously I didn’t implement it well as I don’t see any changes to the allocations, nor time.

gdalle · June 4, 2023, 6:16pm

Hey there @Boris,

It’s your lucky day, blabbering about Julia best practices is a favorite activity of mine. You’ll find some information in the official docs, for instance

But it’s a bit buried and not necessarily easy to dive into. I tried to reformulate some of this advice in my (WIP) IntroJulia. In particular, check out IntroJulia - package dev and IntroJulia - performance. It sums up most of the generic advice I could give you.

gdalle · June 4, 2023, 6:24pm

Now onto some of your specific questions:

Usually this denotes iteration, and you’re fine. Anything that shows up blue in @code_warntype is not a cause for worry. If you want an output that more closely matches your source code (instead of the low level representation that is harder to decipher), check out GitHub - JuliaDebug/Cthulhu.jl: The slow descent into madness. A bit scary at first but remarkably easy to get a handle on. Basically a @code_warntype on steroids with arbitrary depth.

You also have to take into account clarity here. Separate functions are usually a good thing… up to a point. Not sure what that one-liner is, but there are many situations where sigma = rand(Normal(0, 1)) would be clearer that sigma = draw_sigma() (dumb example). So if it doesn’t affect your code speed, you shouldn’t always bother.

What do you mean by “inherit”? You already put using LinearAlgebra in your NewB module, and if you need it in your main file then put it there again. The module is designed to be reusable by someone else, who may not import LinearAlgebra in their main file after all.

gdalle · June 4, 2023, 6:25pm

As for the actual code, since you said you’re a GitHub newbie and I’m an evil person, I’m gonna open some GitHub issues to get you used to them

See the GitHub issues I opened

wc4wc4wc4 · June 4, 2023, 7:56pm

Just to be sure; you do know about Turing.jl, right?
I assume that the goal of this post is to get better acquainted with the do and don’ts of Julia performance-wise, however, I just wanted to be sure that you knew it.

Boris · June 4, 2023, 8:51pm

Oh boy, thank you! I will go through it, hopefully this week. I have read actually and tons of stuff about Julia over the years but because I do it sporadically (on intense intervals quite far apart) I have a lot of diminishing returns.

The docs are extremely dense - every sentence has so much information to it that it becomes tiring after a while. Especially, when every line is spending 10-15 minutes and after 30 minutes I don’t feel like I have any progress. And because I am doing all this on the side in my free time I am not getting very far ahead. Even the note taking is time consuming. I understand it doesn’t work that way, I am just annoyed that I don’t get it the first time

Boris · June 4, 2023, 8:58pm

I actually had no idea, thank you! I mean, I’ve seen the package name but never really checked it out.

But yes, my goal is to actually learn Julia. My research is currently about Bayesian Mixed Frequency Vectorautoregressions and using them for doing economic forecasts and nowcasts and I will be implementing those. For example I have this working paper already implemented in Matlab..

But I am happy to look into the codes of others, I learn a lot, so I will see what Turing has to offer and see if I can directly implement stuff from there.

Boris · June 4, 2023, 8:58pm

Hahah, that’s the best way to learn, so please do, I will definitely play along!

The longer post I have to digest more and would reply later.

Topic		Replies	Views
Linear regression Gibbs sampler Probabilistic programming	2	794	July 3, 2021
Matrix formulation of multiple regression Machine Learning linearalgebra	3	1844	April 3, 2020
Tips for making a program more Julian New to Julia question	25	1165	May 17, 2021
ANN: Bayesian Linear Regression Package Announcements	2	1248	February 6, 2021
Learning Bayesian data analysis using Julia Statistics	22	11469	October 16, 2018