Hey,
Consider a random dataset:
using Distributions
using Random
using Statistics
using StatsBase
X = rand(LogNormal(0,1),10000)
Consider a first function that ‘summarises’ the dataset into a simple vector of k
statistics (here k=16 for this simple example):
function moms(x)
return mean(x .^ (0:15)' .* exp.(-x),dims=1)
end
m = moms(X)
and a second function that does a lot of heavy computations on these statistics, which append to also output a k
-sized vector :
function heavy_computations(m)
# do a lot a stuff with m
result = log.(1 .+abs.(m)) .+ 1
return result
end
r = heavy_computations(m)
Now i want to asses the empirical covariance of the final k
statistics. I could bootstrap:
function boot(M,data,f)
n_obs = length(data)
r = zeros((M,length(f(data))))
for i in 1:M
r[i,:] = f(data[sample(1:n_obs,n_obs,replace=true)])
end
return r
end
M = 1000
cov_r = cov(boot(M,X,heavy_computations ∘ moms))
but it computes the heavy_computations
function M times ! It’s on the other hand way faster to only compute the covariance of m:
cov_m = cov(boot(1000,X,moms))
Could I leverage the fact that the heavy_computations
function is written in Julia, a little like Measurement.jl
but multivariate ?