Generating high-dimensional Sobol sequences

jocklawrie · March 6, 2019, 5:39am

Hi there,

I am using Sobol sequences to generate say 10,000 draws from the Multivariate Normal with dimension 100, mean vector 0 and covariance matrix equal to the identity matrix.

Generating the sequence is fine, but I find that the distribution of the resulting points is highly concentrated. That is, the density of each draw, relative to that of the other draws, is almost zero for 99% of points, then several orders of magnitude higher for just a few points. I’d expect the relative densities to be a little more even.

Example below. Any ideas what I’m doing wrong?

Thanks

using Sobol
using Distributions
using UnicodePlots

function generate_z(K::Int, nr::Int)
    # Initialize result with draws from the Uniform distribution over the unit cube [0, 1]^K.
    result = fill(0.0, K, nr)
    s = SobolSeq(K)
    skip(s, nr)
    for r = 1:nr
        result[:, r] .= next!(s)
    end

    # Convert to draws from MVN(z | 0_k, I_k) using the inverse CDF
    d = Normal(0.0, 1.0)
    for r = 1:nr
        for i = 1:K
            result[i, r] = quantile(d, result[i, r])
        end
    end
    result
end

function zdensity(z::Array{Float64, 2})
    K, nr  = size(z)
    result = fill(1.0, nr)
    d      = Normal(0.0, 1.0)
    totaldens = 0.0
    for r = 1:nr
        for i = 1:K
            result[r] *= pdf(d, z[i, r])
        end
        totaldens += result[r]
    end
    result ./= totaldens
end

z = generate_z(100, 10_000);
dens = zdensity(z)
lineplot(dens)
describe(dens)

Tamas_Papp · March 6, 2019, 8:47am

First, I would use the sum of log pdfs, which is much less prone to numerical error. These look fine to me when compared with random normals. Also simple statistics look ok.

using Statistics, LinearAlgebra
extrema(mapslices(mean, z; dims = 2))
extrema(mapslices(std, z; dims = 2))
z_log_density(z) = normalize!(vec(sum(logpdf.(Ref(Normal(0, 1)), z); dims = 1)), 1)
z = generate_z(100, 10_000)
z_dens = z_log_density(z)
sim_z = randn(size(z))
sim_z_dens = z_log_density(sim_z)
ps = 0:0.1:1
quantile(z_dens, ps)
quantile(sim_z_dens, ps)

yields

julia> extrema(mapslices(mean, z; dims = 2))
(-0.0007172362463843937, 0.0009932045561835572)

julia> extrema(mapslices(std, z; dims = 2))
(0.999483368783979, 1.0003834076282714)

julia> quantile(z_dens, ps)
11-element Array{Float64,1}:
 -0.00012070140687908017
 -0.0001063122814842569 
 -0.00010400024477877038
 -0.00010240952128016883
 -0.00010104242015569829
 -9.979156638516813e-5  
 -9.854879329589064e-5  
 -9.727505743865635e-5  
 -9.584836020836684e-5  
 -9.393933488262388e-5  
 -8.430228880599951e-5  

julia> quantile(sim_z_dens, ps)
11-element Array{Float64,1}:
 -0.00012556606255873136
 -0.00010655950256849811
 -0.0001041351251443149 
 -0.0001024130143312689 
 -0.00010104015366172799
 -9.97620830194371e-5   
 -9.849225327092044e-5  
 -9.72325802687073e-5   
 -9.577554898483448e-5  
 -9.377537093476237e-5  
 -8.342424675283704e-5

jocklawrie · March 7, 2019, 4:54am

Thanks for responding.

I don’t think numerical error is at play here. While it is true that across each of the 100 dimensions we get a good (low-discrepancy) sample from the univariate standard normal, I think we are seeing a natural consequence of high-dimensional sampling…that is, a handful of points dominating the distribution in terms of density.

I suspect this is unavoidable because within a draw all of the 100 component densities are less than 0.4 (pdf(Normal(0, 1), 0)), and their product is usually very small. But sometimes (rarely) most of the 100 component densities will be relatively close to 0.4 (or at least on the order of 10^-1), which results in a much larger density than that of most draws…several orders of magnitude in fact.

Topic		Replies	Views
Simulation of super high-dimensional distribution Statistics question	3	1116	March 24, 2020
Random draws of multivariate normal with positive semi-definite covariance matrix Statistics linearalgebra	4	1093	March 26, 2022
Reducing allocations for repeatedly generating MvNormal distributions Performance	2	448	May 16, 2020
How to solve Path Integral Problems with Probabilistic Programming Techniques? Probabilistic programming	8	249	March 22, 2023
Evaluating a MvNormal density with a sparse covariance matrix Performance question	1	185	May 1, 2023

Generating high-dimensional Sobol sequences

Related topics