# Generating high-dimensional Sobol sequences

Hi there,

I am using Sobol sequences to generate say 10,000 draws from the Multivariate Normal with dimension 100, mean vector 0 and covariance matrix equal to the identity matrix.

Generating the sequence is fine, but I find that the distribution of the resulting points is highly concentrated. That is, the density of each draw, relative to that of the other draws, is almost zero for 99% of points, then several orders of magnitude higher for just a few points. I’d expect the relative densities to be a little more even.

Example below. Any ideas what I’m doing wrong?

Thanks

``````using Sobol
using Distributions
using UnicodePlots

function generate_z(K::Int, nr::Int)
# Initialize result with draws from the Uniform distribution over the unit cube [0, 1]^K.
result = fill(0.0, K, nr)
s = SobolSeq(K)
skip(s, nr)
for r = 1:nr
result[:, r] .= next!(s)
end

# Convert to draws from MVN(z | 0_k, I_k) using the inverse CDF
d = Normal(0.0, 1.0)
for r = 1:nr
for i = 1:K
result[i, r] = quantile(d, result[i, r])
end
end
result
end

function zdensity(z::Array{Float64, 2})
K, nr  = size(z)
result = fill(1.0, nr)
d      = Normal(0.0, 1.0)
totaldens = 0.0
for r = 1:nr
for i = 1:K
result[r] *= pdf(d, z[i, r])
end
totaldens += result[r]
end
result ./= totaldens
end

z = generate_z(100, 10_000);
dens = zdensity(z)
lineplot(dens)
describe(dens)
``````

First, I would use the sum of log pdfs, which is much less prone to numerical error. These look fine to me when compared with random normals. Also simple statistics look ok.

``````using Statistics, LinearAlgebra
extrema(mapslices(mean, z; dims = 2))
extrema(mapslices(std, z; dims = 2))
z_log_density(z) = normalize!(vec(sum(logpdf.(Ref(Normal(0, 1)), z); dims = 1)), 1)
z = generate_z(100, 10_000)
z_dens = z_log_density(z)
sim_z = randn(size(z))
sim_z_dens = z_log_density(sim_z)
ps = 0:0.1:1
quantile(z_dens, ps)
quantile(sim_z_dens, ps)
``````

yields

``````julia> extrema(mapslices(mean, z; dims = 2))
(-0.0007172362463843937, 0.0009932045561835572)

julia> extrema(mapslices(std, z; dims = 2))
(0.999483368783979, 1.0003834076282714)

julia> quantile(z_dens, ps)
11-element Array{Float64,1}:
-0.00012070140687908017
-0.0001063122814842569
-0.00010400024477877038
-0.00010240952128016883
-0.00010104242015569829
-9.979156638516813e-5
-9.854879329589064e-5
-9.727505743865635e-5
-9.584836020836684e-5
-9.393933488262388e-5
-8.430228880599951e-5

julia> quantile(sim_z_dens, ps)
11-element Array{Float64,1}:
-0.00012556606255873136
-0.00010655950256849811
-0.0001041351251443149
-0.0001024130143312689
-0.00010104015366172799
-9.97620830194371e-5
-9.849225327092044e-5
-9.72325802687073e-5
-9.577554898483448e-5
-9.377537093476237e-5
-8.342424675283704e-5
``````

Thanks for responding.

I don’t think numerical error is at play here. While it is true that across each of the 100 dimensions we get a good (low-discrepancy) sample from the univariate standard normal, I think we are seeing a natural consequence of high-dimensional sampling…that is, a handful of points dominating the distribution in terms of density.

I suspect this is unavoidable because within a draw all of the 100 component densities are less than 0.4 (`pdf(Normal(0, 1), 0)`), and their product is usually very small. But sometimes (rarely) most of the 100 component densities will be relatively close to 0.4 (or at least on the order of 10^-1), which results in a much larger density than that of most draws…several orders of magnitude in fact.