Efficient Implementation of Loop for Bootstrapping Statistics

I am trying to build a loop for a standard recursive-design residual-based nonparametric bootstrap algorithm for a Vector Autoregression (VAR). However, so far, I’m struggling to find an efficient way of writing the algorithm. My current implementation either takes too long to complete a single replication or allocates too much memory.

npbootstrap is the main function which generates bootstrap replicates of the slope parameters of the VAR and the lower Cholesky factor of the covariance matrix of the associated residuals. These would be used later on to compute confidence bands for statistics of interest (e.g. impulse response functions).

Below is a toy example with artificial data and parameter estimates.

using LinearAlgebra, Statistics

pυ = 3 # VAR (lag) order
nυ = 8 # number of variables
Te = 720 # length of vector time series
kζ = nυ * pυ # total number of regressors per equation
Tζ = Te - kζ # degrees of freedom per equation

Y0 = randn(pυ, nυ) # initial conditions
Aυ = randn(nυ * pυ, nυ) # OLS slope parameter estimates
 U = randn(Te, nυ) # residuals

B = 500 # number of bootstrap replications

# main function
function npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)
    Uc = transform(Te, Tζ, U)
    Ab = Array{Float64,3}(undef, kζ, nυ, B)
    Pb = Array{Float64,3}(undef, nυ, nυ, B)
    for b in 1:B
        Yr = npresample(Y0, Aυ, nυ, pυ, Te, Uc)
        Yr, Zr = regressors(Yr, nυ, pυ, Te)
        Ar = ls(Yr, Zr)
        Ur = resid(Yr, Ar, Zr)
        Σr = covar(Tζ, Ur)
        Pr = cholesky(Symmetric(Σr)).L
        Ab[:, :, b] = Ar
        Pb[:, :, b] = Pr
    end
    Ab, Pb
end
# auxiliary functions
function lag(Yc, nυ, pυ, Te)
    L = Matrix{Float64}(undef, Te, nυ * pυ)
    for i in 1:pυ, j in 1:nυ
        L[:, j + nυ * (i - 1)] = Yc[(pυ + 1 - i):(pυ + Te - i), j]
    end
    L
end

function strim(Yc, pυ)
    Ys = Yc[1:pυ, :]
end

function etrim(Yc, pυ, Te)
    Ye = Yc[(pυ + 1):(pυ + Te), :]
end

function regressors(Yc, nυ, pυ, Te)
    Yb = lag(Yc, nυ, pυ, Te)
     Z = Yb
    Ye = etrim(Yc, pυ, Te)
    Ye, Z
end

function ls(Yc, Z)
    A = Symmetric(Z'Z) \ (Z'Yc)
end

function resid(Yc, A, Z)
    U = Yc - Z * A
end

function covar(Tζ, U)
    Σ = (U'U) / Tζ
end

function transform(Te, Tζ, U)
    Uc = U .- mean(U, dims=1)
    Uc = sqrt(Te / Tζ) * Uc
end

function npresample(Y0, Aυ, nυ, pυ, Te, Uc)
    Yr = Matrix{Float64}(undef, pυ + Te, nυ)
    Yr[1:pυ, :] = Y0
    Ur = Uc[rand(1:Te, Te), :]
    for t in 1:Te
        Yrt = zeros(1, nυ)
        Urt = Ur[t, :]'
        for l in 1:pυ
            Yrtl = Yr[pυ + t - l, :]'
             Aυl = Aυ[(nυ * (l - 1) + 1):(nυ * l), :]
            Yrt += Yrtl * Aυl
        end
        Yr[pυ + t, :] = Yrt + Urt
    end
    Yr
end
using BenchmarkTools

@benchmark npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)
BenchmarkTools.Trial:
  memory estimate:  1.03 GiB
  allocs estimate:  4777010
  --------------
  minimum time:     658.557 ms (5.69% GC)
  median time:      727.883 ms (5.17% GC)
  mean time:        721.912 ms (6.41% GC)
  maximum time:     767.051 ms (5.34% GC)
  --------------
  samples:          7
  evals/sample:     1
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\User1\AppData\Local\atom\app-1.38.2\atom.exe"  -a
  JULIA_NUM_THREADS = 4
1 Like

Have you looked at https://docs.julialang.org/en/v1/manual/performance-tips/ , especially about

  1. checking type stability with @code_warntype, and

  2. profiling your code?

Sure! I did check some parts of the Performance Tips sections.

By adding the @views macro before every loop in the main and the auxiliary functions, memory allocation is halved and computing time is reduced by approx. one third. @inbounds improved only marginally on this. Nevertheless, it still stakes too long to compute.

I was wondering if any of the functions can be improved, in particular, regarding either variable preallocation or the order of indexing of the different arrays. Any help would be greatly appreciated.

This is why I recommended profiling, and checking type stability. You get more useful information if you break up your code into smaller functions.

Running

@code_warntype npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)

your type-stability looks OK, so preallocation is the first thing I’d look at. Running 500 bootstrap replicates is allocating a gigabyte of memory, which seems really high. Just skimming your code, I noticed, for instance, the first lines of the npresample and lag function allocate new arrays and are called at each bootstrap iteration.

As a general suggestion, you might try defining a custom struct or two to hold the relevant state variables. I find this can help me reason more easily about what can be preallocated and updated in place…

1 Like

I’m going to try to implement that to see if it helps.