Advice on numerical integration of a very "sparse" function

I have a function that when sampled evenly across its two domains looks something like this:


Almost everything is zero…
I need to integrate it. I tried HCubature.jl and Cuba.jl but they either returned zero or failed (see related issue). I’m not looking for some ultra accuracy, but it would be good to get a result that isn’t zero.
Another approach would be to use Interpolations.jl some how. While there should be a good way of doing this (see this issue), there should also be a sub-optimal but simpler way of integrating this…

Does anyone have any suggestions on how…?

Thanks!

ApproxFun might work here:

@time f = Fun((x,y) -> exp(-1000*((x-0.4)^2+y^2)), (0..1)^2) # 0.05s
sum(f) # 0.0015707963267948962, in 0.0025s
2 Likes

Can’t believe it, I got zero again…! Hmmmmmm

Is norm(f.coefficients) == 0? If so it’s somehow missed the hump. Dividing your rectangle in two so a corner is at the hump should fix it.

Otherwise, maybe the integral is zero?

1 Like

Do you know in advance where the bump is?

1 Like

It seems a bit too simple, but sum() might work over the sampling?

A better estimate might be with a [1 1; 1 1] / 4 filter, estimating box heights from the corners

was zero, but

resulted in this:

julia> @time f = Fun((x,y) -> getsignal([x,y]), (0.82..1)^2)
┌ Warning: Maximum number of coefficients 1048577 reached in constructing Fun.                         
└ @ ApproxFun ~/.julia/packages/ApproxFun/DkMmw/src/Fun/constructors.jl:141
 16.831609 seconds (209.32 M allocations: 5.272 GiB, 13.14% gc time)
Fun(Chebyshev(0.82..1.0)⊗Chebyshev(0.82..1.0),[0.0165691, 0.031849, -0.0267369, 0.0281701, -0.05141, 0.0114935, 0.0226335, -0.0455183, 0.0221598, 0.00361066  …  6.72985e-9, 1.8547e-8, -1.45913e-7, 2.26589e-7, -1.73169e-7, 5.40419e-8, 7.07458e-9, 2.9705e-8, -1.05139e-7, 3.52394e-8])

julia> sum(f)
0.00016545352622070335

That sum call was also pretty slow…

No I do not. But I could uniformly sample about 100 points and then I can get it…

This might not work, since I need to compare the results on this integrand to others where the number of hits (as in a location where the function is not equal to zero) will be different and therefore be very susceptible to noise.

You’re lucky that your function is only 2d.

So your first question should be: How do you discover the needle in the haystack (regions where the function is far from zero)? An easy approach might be to use IntervalRootFinding.jl for f-epsilon. That abuses the fact that you have structural knowledge of f:

You don’t just have an oracle for f or derivatives; no, you have a machine that computes f! This of course only works for some functions (for most reasonable functions you can write down in julia); and it is only fast for certain functions. Once you discovered the needle in the haystack, you could zoom in for HCubature.

Alternatively, you can take a lot of samples. That may miss narrow spikes with large mass; depending on your f, one or the other way may be more efficient.

Note that the usefulness of IntervalRootFinding depends not only on your mathematical function (the relation between input and output), but depends crucially on the explicit algorithm used to compute f.

Presumably you are looking for some accuracy, otherwise

gimmeresult() = 42.0

would work (as it isn’t zero :troll:).

Can you provide an MWE that is similar to your getsignal?

1 Like

hmm, getsignal is the result of ray tracing a scallop eye. It’s two dimensions are elevation and azimuth angles of the light leaving a point source. At larger viewing angles only a fraction of the light enters the eye and ends up as a “signal”. I could set it all up for you to explore…
Right now I’m testing Interpolations and then some sort of Trapezoidal rule… Maybe Julia has already a trapz function?

If the function is “smooth” (continuous and maybe reasonably differentiable, or a good approximation thereof), you should be much better off with adaptive cubature and/or ApproxFun.jl. A naive trapezoidal rule is kind of a last resort for badly behaved functions, at a large cost of accuracy. The fact that the aforementioned methods fail suggests that exploration using an MWE could be helpful.

Also, for debugging you could try to plot/integrate a 1D slice, eg one along the y axis that contains the “bump”.

1 Like

It’s smooth where it’s > 0, everywhere else it’s zero. I think the bumps might even have just one maxima… But I’m not sure. It just irritates me that while I can quickly see where all the bumps are by evaluating 100x100 points, neither of the Cuba-related packages pick those up. Cause then it should be fine. Finding the needle first, like @foobar_lv2 suggests, might be most robust, but to me, it seems like an overkill a bit. Might be wrong.
I’ll try and set it up then…!

If you know the bounds, perhaps you can brute force an approximate integral using Sobol sequences,

https://github.com/stevengj/Sobol.jl

which should also be a reasonable quick & dirty solution for finding the (proximity of the) bump.

We can add some keyword argument to force HCubature to subdivide the domain at the beginning to ensure that it is sampled more finely to start with.

1 Like

I’m testing Cuba.jl’s xgiven to see if that helps (from giordano’s reply )…
Also, trying Sobol.jl, cool!

Maybe try some more options with cuba? Here is an example where cuba finds even a much smaller 2d peak out of the box:

using Cuba

function f!(v, out)
    # tiny circle of radius r
    # expected integral is 1
    r = 1e-3
    x,y = v
    if (x-0.5)^2 + (y-0.5)^2 < r^2
        out[1] = 1 / (pi * r^2)
    else
        out[1] = 0
    end
end

@show vegas(f!,2)
vegas(f!, 2) = Component:
 1: 0.920180950098721 ± 0.0004369999395848204 (prob.: 1.0)
Integrand evaluations: 1007500
Fail:                  1
Number of subregions:  0
1 Like

:man_shrugging:

julia> vegas(f!,2)
Component:                                                                                             
 1: 0.0 ± 7.025180405943273e-18 (prob.: -999.0)
Integrand evaluations: 1000
Fail:                  0
Number of subregions:  0

Your function was too easy: 0.5 is an obvious sample point. E.g.

julia> function f!(v, out)
           # tiny circle of radius r
           # expected integral is 1
           r = 1e-3
           xm = 0.7654
           ym = 0.4567
           x,y = v
           if (x-xm)^2 + (y-ym)^2 < r^2
               out[1] = 1 / (pi * r^2)
           else
               out[1] = 0
           end
       end
julia> @show vegas(f!, 2)
vegas(f!, 2) = Component:
 1: 0.0 ± 7.025180405943273e-18 (prob.: -999.0)
Integrand evaluations: 1000
Fail:                  0
Number of subregions:  0
Component:
 1: 0.0 ± 7.025180405943273e-18 (prob.: -999.0)
Integrand evaluations: 1000
Fail:                  0
Number of subregions:  0
3 Likes

This is a nice example: for an arbitrary numerical integration method, one can choose a (xm, ym) and r small enough so that this will fail. For a smoothly differentiable function, one could find the peak numerically (especially after a meaningful transformation) and work around that, or use MCMC, but not for this one.

So I used Sobol to improve on the sampling, and with just 25 points I already hit all the live areas. I then feed those into Cuba via the xgiven optional argument of divonne and it works! Pretty promising…

1 Like