QMC vs rand

tchebycheff · November 14, 2023, 10:00am

X_t is a d-dimensional geometric Brownian process and Y_t = \min_i X_t^{(i)}. My objective is to simulate paths of Y_t at t=\Delta, 2\Delta, ..., M\Delta.

There are significant differences in E(Y_t) estimates from QMC/Sobol & MC using Distributions.jl. E(Y_t) should decrease with t, however this is not the case with QMC result regardless of the sample size. I thought it was Julia specific, however, I see similar differences using Python/scipy-stats.qmc as well.

What could be the issue with the QMC approach? I would greatly appreciate suggestions, comments. Thanks!

using Distributions, QuasiMonteCarlo, StatsFuns, LinearAlgebra

function mvn_qmc(μ, Σ, samples)
    d = length(μ)
    L = cholesky(Σ).L
    p = ceil(Int, log2(samples))
    z = QuasiMonteCarlo.sample(2^p,d,
        SobolSample(R = OwenScramble(base = 2, pad = 32)))[:,1:samples]
    z .= norminvcdf.(z)
    return μ .+ L * z
end

function min_paths_qmc(μ, Σ, paths, pathlen)
    Δ = 1.0/pathlen
    r = mvn_qmc((μ .- 0.5*diag(Σ))*Δ, Σ*Δ, paths*pathlen)
    r = reshape(r, :, pathlen, paths)
    cumsum!(r, r, dims=2)
    w = reshape(minimum(r, dims=1), pathlen, paths)
    return w
end

function min_paths_mc(μ, Σ, paths, pathlen)
    Δ = 1.0/pathlen
    d = MvNormal((μ .- 0.5*diag(Σ))*Δ, Σ*Δ)
    r = reshape(rand(d, paths * pathlen), :, pathlen, paths)
    cumsum!(r, r, dims=2)
    w = reshape(minimum(r, dims=1), pathlen, paths)
    return w
end

μ = [0.05, 0.06, 0.04]
Σ = [0.1 0.01 0.02; 0.01 0.1 0.01; 0.02 0.01 0.1]
pathlen = 6
rng = 5000:10000:50000

qmc = zeros(pathlen, length(rng))
mc = zeros(pathlen, length(rng))
for (i,paths) in enumerate(rng)
    qmc_samples = min_paths_qmc(μ, Σ, paths, pathlen)
    mc_samples = min_paths_mc(μ, Σ, paths, pathlen)
    qmc[:,i] .= mean(qmc_samples, dims=2)
    mc[:,i] .= mean(mc_samples, dims=2)
end

QMC result:

6×5 Matrix{Float64}:
 -0.0693403  -0.0686555  -0.130352   -0.142068   -0.142267
 -0.122929   -0.109585   -0.0936552  -0.0863832  -0.10844
 -0.112686   -0.0968324  -0.159948   -0.153843   -0.16357
 -0.142484   -0.117048   -0.112606   -0.0969064  -0.110946
 -0.113757   -0.101172   -0.174056   -0.175542   -0.174227
 -0.133638   -0.121298   -0.117966   -0.121578   -0.123275

MC Result:

6×5 Matrix{Float64}:
 -0.104534  -0.100965  -0.10238   -0.101322  -0.102051
 -0.147406  -0.143225  -0.143192  -0.14371   -0.14456
 -0.176493  -0.177197  -0.176859  -0.175868  -0.177008
 -0.20365   -0.203276  -0.202989  -0.203152  -0.205002
 -0.226064  -0.226917  -0.227966  -0.227     -0.228706
 -0.249139  -0.249241  -0.249541  -0.248267  -0.251252

foobar_lv2 · November 14, 2023, 12:28pm

I’m not sure what you expected here?

Sequences for QMC are intended for numerical integration.

Using quasi monte carlo for sampling / simulation is a category error.

QMC sequences don’t have the law of big numbers / standard deviation 1/sqrt(N) behavior you’d get from random. That’s the entire point of QMC.

tchebycheff · November 14, 2023, 1:04pm

Isn’t E(Y_t) an integral?

ChrisRackauckas · November 14, 2023, 1:04pm

Interesting. This is quite similar to this study here which uses QMC for options pricing which is using the same approach (see page 5)

ChrisRackauckas · November 14, 2023, 1:09pm

Check the case with mean zero first.

tchebycheff · November 14, 2023, 1:21pm

Indeed, it is widely used in finance. What i posted is part of pricing options based on worst-performing stocks.

https://www.broda.co.uk/doc/PricingRiskManagement_Sobol.pdf

foobar_lv2 · November 14, 2023, 3:02pm

Could you clarify your question by making all dimensions clear?

I.e. d: You explained that in the beginning. M: The length of your paths. This is super unclear from your code! N: The number of paths you’re sampling. This is super unclear from your code!

It looks to me like you’re trying to work with d = 3, M = 6 and N = 3?

Could you check with N = 30_000?

Here you sample some number of quasi-random d-dimensional points. This is a category error. You must sample some number of quasi-random M * d-dimensional points.

tchebycheff · November 15, 2023, 12:41am

Each path X is a d x pathlen matrix.
There are paths number of paths.
The for loop checks for different paths from 5_000 to 50_000. The matrices are results from the runs.

foobar_lv2 · November 15, 2023, 10:51am

So I fixed it for you:

julia> function mvn_qmc(mu, sig, npaths, pathlen)
           d = length(mu)
           L = cholesky(sig).L
           samples = 2^ceil(Int, log2(npaths))
           z = reshape(QuasiMonteCarlo.sample(samples, d*pathlen,
               SobolSample(R = OwenScramble(base = 2, pad = 32)))[:,1:npaths], d, npaths*pathlen )
           z .= norminvcdf.(z)
           return mu .+ L*z
       end
mvn_qmc (generic function with 2 methods)

julia> function min_paths_qmc(μ, Σ, paths, pathlen)
           Δ = 1.0/pathlen
           r = mvn_qmc((μ .- 0.5*diag(Σ))*Δ, Σ*Δ, paths,pathlen)
           r = reshape(r, :, pathlen, paths)
           cumsum!(r, r, dims=2)
           w = reshape(minimum(r, dims=1), pathlen, paths)
           return w
       end

However, I’d like to advise you that this code is very terrible.

First a nitpick: Non ascii like your delta or sigma may work fine for you, but this will exclude a significant minority of people (maybe 20%?) from editing your code. Ask yourself whether it’s worth it for cute looking text.

Second, you do a lot of reshaping and broadcast and * for matmul and cumsum with dimensions, etc. This makes it very hard to see what happens. These operations have gravitas and should not be hidden! An explicit loop or explicit matmul call can improve readability.

Third, QMC has absolutely nothing to do with random sampling. Quasi-random and random have absolutely nothing to do with each other, except for sharing the very unfortunate name. QMC is a family of cubature rules. Your original code confused that and used different QMC points within the same time-series. You wouldn’t use grid points of a different cubature rules as noise values for a time-series and then be surprised about wrong results either.

My fix of your code is still terrible on point 3: It is not obvious that this code does the right thing. The reshaping / matmul / cumsum / min business does not make it clear that you never mix values belonging to different cubature points. So it should be re-architected to make that property obvious and that class of bugs impossible.

edit: Apologies if I sounded too harsh. Both the non-ascii letters and the terminology around QMC are not your fault (they’re your problem to fix in your code, at best).

edit: PPS. You don’t need to simulate time-series at all. As long as your random walk is linear, you can directly sample at each time t, independently. Then you get away with a bunch of 3 dimensional integrals, that you can do either by MC or QMC or better cubature rules. In other words: The dimensionality of your integrals does not really scale in pathlen. If you really want to evaluate a low-dimensional integral, then there are better tools than QMC for that.

ChrisRackauckas · November 15, 2023, 12:08pm

A little harsh, let’s tone it down a little. Some of those are subjective points. But yes, I think the readability aspect here is from a “always vectorize” standpoint, and I agree this probably could be much easier to read if there was a loop instead of trying to put everything into a single call. In Julia looping is fine performance wise so it should be used.

Or you can just solve the Fokker-Plank equation and get the full probabilistic evolution. Or in this case you can use the analytical solution. But I would think that this was just an exercise or test of QMC, not necessarily the full application.

tchebycheff · November 15, 2023, 1:35pm

My fix of your code is still terrible on point 3: It is not obvious that this code does the right thing.

I have tried this and this does not work.

However, I’d like to advise you that this code is very terrible.

Thanks, but I don’t give a damn. It looks fine. BTW, non-ascii is pretty common in Julia world.

Third, QMC has absolutely nothing to do with random sampling.

This has nothing to do with sampling. This is numerical integration. Expectations are integrals. E(Y) is a convenient short form to write an integral.

edit: Apologies if I sounded too harsh.

Don’t. Own your nasty. Wear it proudly.

foobar_lv2 · November 15, 2023, 3:08pm

using Distributions, QuasiMonteCarlo, StatsFuns, LinearAlgebra
function min_paths_mc(μ, Σ, paths, pathlen)
    Δ = 1.0/pathlen
    d = MvNormal((μ .- 0.5*diag(Σ))*Δ, Σ*Δ)
    r = reshape(rand(d, paths * pathlen), :, pathlen, paths)
    cumsum!(r, r, dims=2)
    w = reshape(minimum(r, dims=1), pathlen, paths)
    return w
end

function mvn_qmc(mu, sig, npaths, pathlen)
           d = length(mu)
           L = cholesky(sig).L
           samples = 2^ceil(Int, log2(npaths))
           z = reshape(QuasiMonteCarlo.sample(samples, d*pathlen,
               SobolSample(R = OwenScramble(base = 2, pad = 32)))[:,1:npaths], d, npaths*pathlen )
           z .= norminvcdf.(z)
           return mu .+ L*z
       end

function min_paths_qmc(μ, Σ, paths, pathlen)
           Δ = 1.0/pathlen
           r = mvn_qmc((μ .- 0.5*diag(Σ))*Δ, Σ*Δ, paths,pathlen)
           r = reshape(r, :, pathlen, paths)
           cumsum!(r, r, dims=2)
           w = reshape(minimum(r, dims=1), pathlen, paths)
           return w
       end

μ = [0.05, 0.06, 0.04];
Σ = [0.1 0.01 0.02; 0.01 0.1 0.01; 0.02 0.01 0.1];
pathlen = 6;
rng = 5000:10000:50000;

qmc = zeros(pathlen, length(rng));
mc = zeros(pathlen, length(rng));
for (i,paths) in enumerate(rng)
    qmc_samples = min_paths_qmc(μ, Σ, paths, pathlen)
    mc_samples = min_paths_mc(μ, Σ, paths, pathlen)
    qmc[:,i] .= mean(qmc_samples, dims=2)
    mc[:,i] .= mean(mc_samples, dims=2)
end

yields

julia> mc
6×5 Matrix{Float64}:
 -0.100512  -0.101007  -0.101292  -0.101519  -0.101504
 -0.140143  -0.142749  -0.145192  -0.143325  -0.144083
 -0.16857   -0.176598  -0.178625  -0.176286  -0.175524
 -0.197853  -0.204147  -0.205394  -0.204554  -0.202637
 -0.22167   -0.226734  -0.230401  -0.228462  -0.227108
 -0.243607  -0.248088  -0.251407  -0.250059  -0.24931

julia> qmc
6×5 Matrix{Float64}:
 -0.101644  -0.101671  -0.101653  -0.101668  -0.101656
 -0.143707  -0.143885  -0.143689  -0.143768  -0.143762
 -0.175439  -0.175987  -0.175877  -0.176041  -0.176085
 -0.202753  -0.203163  -0.203496  -0.203335  -0.203127
 -0.227238  -0.227362  -0.227741  -0.227291  -0.227111
 -0.247472  -0.248876  -0.249509  -0.248994  -0.249258

Does this match your expectations for the result?

Does this reproduce on your machine?

tchebycheff · November 16, 2023, 5:28am

Indeed, I can reproduce. Thank you.

It has been an interesting exercise. More on collinearity type issues from slicing/reshaping QMC samples discussed in sec 3.2 of this paper.

http://www.axelkind.com/teaching/financial_derivatives/papers/boyle_broadie_glasserman_mc_overview_jedc.pdf

Topic		Replies	Views
Different behaviors on Latin Hypercube Sampling from two packages Statistics	8	756	January 23, 2023
Best random number generator for Monte Carlo simulations Modelling & Simulations	12	979	May 12, 2023
Function rand() or sample() for MCMC sampling and similar? Probabilistic Programming	19	1640	September 23, 2019
[ANN] QuasiMonteCarlo.jl Modelling & Simulations announcement	0	776	September 3, 2019
MVN CDF -- have it coded, need help getting integrating into Distributions.jl Statistics	18	2845	March 25, 2022

QMC vs rand

Related topics