Efficient Implementation of Loop for Bootstrapping Statistics

slakeon · June 26, 2019, 2:29am

I am trying to build a loop for a standard recursive-design residual-based nonparametric bootstrap algorithm for a Vector Autoregression (VAR). However, so far, I’m struggling to find an efficient way of writing the algorithm. My current implementation either takes too long to complete a single replication or allocates too much memory.

npbootstrap is the main function which generates bootstrap replicates of the slope parameters of the VAR and the lower Cholesky factor of the covariance matrix of the associated residuals. These would be used later on to compute confidence bands for statistics of interest (e.g. impulse response functions).

Below is a toy example with artificial data and parameter estimates.

using LinearAlgebra, Statistics

pυ = 3 # VAR (lag) order
nυ = 8 # number of variables
Te = 720 # length of vector time series
kζ = nυ * pυ # total number of regressors per equation
Tζ = Te - kζ # degrees of freedom per equation

Y0 = randn(pυ, nυ) # initial conditions
Aυ = randn(nυ * pυ, nυ) # OLS slope parameter estimates
 U = randn(Te, nυ) # residuals

B = 500 # number of bootstrap replications

# main function
function npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)
    Uc = transform(Te, Tζ, U)
    Ab = Array{Float64,3}(undef, kζ, nυ, B)
    Pb = Array{Float64,3}(undef, nυ, nυ, B)
    for b in 1:B
        Yr = npresample(Y0, Aυ, nυ, pυ, Te, Uc)
        Yr, Zr = regressors(Yr, nυ, pυ, Te)
        Ar = ls(Yr, Zr)
        Ur = resid(Yr, Ar, Zr)
        Σr = covar(Tζ, Ur)
        Pr = cholesky(Symmetric(Σr)).L
        Ab[:, :, b] = Ar
        Pb[:, :, b] = Pr
    end
    Ab, Pb
end

# auxiliary functions
function lag(Yc, nυ, pυ, Te)
    L = Matrix{Float64}(undef, Te, nυ * pυ)
    for i in 1:pυ, j in 1:nυ
        L[:, j + nυ * (i - 1)] = Yc[(pυ + 1 - i):(pυ + Te - i), j]
    end
    L
end

function strim(Yc, pυ)
    Ys = Yc[1:pυ, :]
end

function etrim(Yc, pυ, Te)
    Ye = Yc[(pυ + 1):(pυ + Te), :]
end

function regressors(Yc, nυ, pυ, Te)
    Yb = lag(Yc, nυ, pυ, Te)
     Z = Yb
    Ye = etrim(Yc, pυ, Te)
    Ye, Z
end

function ls(Yc, Z)
    A = Symmetric(Z'Z) \ (Z'Yc)
end

function resid(Yc, A, Z)
    U = Yc - Z * A
end

function covar(Tζ, U)
    Σ = (U'U) / Tζ
end

function transform(Te, Tζ, U)
    Uc = U .- mean(U, dims=1)
    Uc = sqrt(Te / Tζ) * Uc
end

function npresample(Y0, Aυ, nυ, pυ, Te, Uc)
    Yr = Matrix{Float64}(undef, pυ + Te, nυ)
    Yr[1:pυ, :] = Y0
    Ur = Uc[rand(1:Te, Te), :]
    for t in 1:Te
        Yrt = zeros(1, nυ)
        Urt = Ur[t, :]'
        for l in 1:pυ
            Yrtl = Yr[pυ + t - l, :]'
             Aυl = Aυ[(nυ * (l - 1) + 1):(nυ * l), :]
            Yrt += Yrtl * Aυl
        end
        Yr[pυ + t, :] = Yrt + Urt
    end
    Yr
end

using BenchmarkTools

@benchmark npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)
BenchmarkTools.Trial:
  memory estimate:  1.03 GiB
  allocs estimate:  4777010
  --------------
  minimum time:     658.557 ms (5.69% GC)
  median time:      727.883 ms (5.17% GC)
  mean time:        721.912 ms (6.41% GC)
  maximum time:     767.051 ms (5.34% GC)
  --------------
  samples:          7
  evals/sample:     1

Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = "C:\Users\User1\AppData\Local\atom\app-1.38.2\atom.exe"  -a
  JULIA_NUM_THREADS = 4

Tamas_Papp · June 26, 2019, 4:40am

Have you looked at Performance Tips · The Julia Language , especially about

checking type stability with @code_warntype, and
profiling your code?

slakeon · June 26, 2019, 1:37pm

Sure! I did check some parts of the Performance Tips sections.

By adding the @views macro before every loop in the main and the auxiliary functions, memory allocation is halved and computing time is reduced by approx. one third. @inbounds improved only marginally on this. Nevertheless, it still stakes too long to compute.

I was wondering if any of the functions can be improved, in particular, regarding either variable preallocation or the order of indexing of the different arrays. Any help would be greatly appreciated.

Tamas_Papp · June 26, 2019, 4:00pm

This is why I recommended profiling, and checking type stability. You get more useful information if you break up your code into smaller functions.

ElOceanografo · June 26, 2019, 4:28pm

Running

@code_warntype npbootstrap(Y0, Aυ, nυ, pυ, Te, kζ, Tζ, U, B)

your type-stability looks OK, so preallocation is the first thing I’d look at. Running 500 bootstrap replicates is allocating a gigabyte of memory, which seems really high. Just skimming your code, I noticed, for instance, the first lines of the npresample and lag function allocate new arrays and are called at each bootstrap iteration.

As a general suggestion, you might try defining a custom struct or two to hold the relevant state variables. I find this can help me reason more easily about what can be preallocated and updated in place…

slakeon · June 27, 2019, 11:39pm

I’m going to try to implement that to see if it helps.

Topic		Replies	Views
Performance issue with this bootstrapp Performance	11	524	June 29, 2021
Why is my code very slow General Usage performance	16	680	June 23, 2023
Pre-allocating a structure to reduce memory? New to Julia preallocation	14	2545	August 16, 2022
Updating an array inside a for loop- comparison with MATLAB General Usage	17	3744	September 21, 2017
Generators vs loops vs broadcasting: Calculate PI via Monte Carlo Sampling Performance	7	1501	June 20, 2021

Efficient Implementation of Loop for Bootstrapping Statistics

Related topics