Hello everyone,
long-time lurker, very occasional poster. I decided to give Julia a try once more with 1.9. So far I have struggled with the language on and off (I have touched on this previously) but would really like to eventually get the hang of it. I feel that my biggest obstacles are that it’s hard for me to find the best way to write julian code in Julia
To get better, I have written a small piece of code that is an example encompassing most of the code that I have to work with on a regular basis. The original code from which I wrote this one is a teaching code - focusing more on clarity than speed. I still think it is a good exercise. Also, the Julia code takes 30% of the time of the original code, which is in Matlab, so that is inspiring.
My aim with this topic is to get some general guidance how can I make this code the best it can be and general feedback of what I should watch out for and how I can go about it.
Example summary
The structure of the code is Bayesian linear regression
where we have to estimate the parameters \beta and \Sigma
Thus, the code has the following steps:
- Import data
- Create Y and X
- Setup a big loop where using conditional distributions:
- Draw \beta and save it
- Draw \Sigma and save it
- Do some summaries, plots, etc.
This is pretty much it. I might have many more steps in complex codes usually but the jist is always the same.
The code structure
- I have a module, named NewB, where I keep the 4 functions for my example:
mlag
,genBeta
,genSigma
,gibbs
that perform the aforementioned steps 2 and 3. - The main code, in file MainNewB.jl reads the data and defines some preliminary stuff needed for the main module.
I have put the project on github (new to this as well) and replaced the original data with random numbers. Of your fork it and you prefer the data, uncomment the lines.
I am attaching the code here.
Using @time
macro gives me 0.07 second on my machine with 7000 draws total (vs 0.2 in Matlab so yay!).
Specific questions
-
gibbs
is the main function. Is saving the \beta and \Sigma optimal? Typically I have many more parameters to save and the vectors and matrices are much larger
beta_dist = zeros(5,n_gibbs-burn)
if i > burn
beta_dist[:,i-burn] = beta_d
end
Should I preallocate memory, as in Matlab for that? The @time
macro shows a lot of allocations, especially if I go to 70 000 replications. I know I want to save many matrices but does this point to a problem?
julia> @time include("MainNewB.jl")
0.082845 seconds (171.56 k allocations: 68.732 MiB, 18.70% gc time)
- I am not sure I understand the output of
@code_warntype
for thegibbs
function. Highlights union of nothing and Tuple at some point. Is this something to look out for?
Locals
@_11::Union{Nothing, Tuple{Int64, Int64}}
- If I have one-liners for specific tasks (such as drawing sigma), is putting it in a function still a good idea, is there overhead associated with it? If I understand the performance tip function barriers this is always a good idea.
- Is there a way for module NewB to inherit
using LinearAlgebra
or is this not a good idea? What happens if I have a module needing another module and vice versa in the same time?
Code main file:
using Revise
using LinearAlgebra
using Distributions
using DelimitedFiles
using BenchmarkTools
using NewB
# fdata = readdlm("gdp4795.txt")
# Z = 100*fdata[21:end,:]./fdata[20:end-1,:].-100
# plot(Z)
Z = rand(150,1)
n_gibbs = 7000
burn = 2000
p = 4;
(X,Y) = mlag(Z,p)
Sigma0 = I(5)*4000
BETA0 = zeros(5,1)
sig2_d = 0.5
nu0 =0
d0 =0
beta_d = genBeta(X,Y,BETA0,Sigma0,sig2_d)
sig2_d = genSigma(Y,X,beta_d,nu0,d0)
(beta_dist, sigma_dist) = gibbs(Y,X,BETA0,Sigma0,sig2_d,d0,nu0,n_gibbs,burn)
mean(beta_dist,dims=2)
mean(sigma_dist,dims=2)
Module NewB.jl:
module NewB
using LinearAlgebra
using Distributions
export mlag, genBeta, genSigma, gibbs
"""
mlag(Yfull::Matrix{Float64},p::Integer)
Creates lags of a matrix for a VAR representation with a constant on the left
"""
function mlag(Yfull::Matrix{Float64},p::Integer)
(Tf, n) = size(Yfull)
X = ones(Tf-p,1)
for i = 1:p
X = [X Yfull[p-i+1:end-i,:]]
end
Y = Yfull[p+1:end,:]
return X, Y # this changes the array passed into the function
end
function genBeta(X,Y,Beta_prior,Sigma_prior,sig2_d)
invSig = Sigma_prior^-1
V = (invSig + sig2_d^(-1)*(X'*X))^-1
C = cholesky(Hermitian(V))
Beta1 = V*(invSig*Beta_prior + sig2_d^(-1)*X'*Y)
beta_d = Beta1 + C.L*randn(5,1)
return beta_d
end
function genSigma(Y,X,beta_d,nu0,d0)
nu1 = size(Y,1)+nu0
d1 = d0 + only((Y-X*beta_d)'*(Y-X*beta_d))
sig2_inv = rand(Gamma(nu1/2,2/d1),1)
sig2_d = 1/only(sig2_inv)
return sig2_d
end
function gibbs(Y,X,BETA0,Sigma0,sig2_d,d0,nu0,n_gibbs,burn)
beta_dist = zeros(5,n_gibbs-burn)
sigma_dist = zeros(1,n_gibbs-burn)
for i = 1:n_gibbs
beta_d = genBeta(X,Y,BETA0,Sigma0,sig2_d)
sig2_d = genSigma(Y,X,beta_d,nu0,d0)
if i > burn
beta_dist[:,i-burn] = beta_d
sigma_dist[1,i-burn] = sig2_d
end
end
return beta_dist, sigma_dist
end
end # module NewB
If you have gotten this far, thank you very much!