"vectorized" QuadGK?

I’ll start off by saying that QuadGK is really fast. But I want to know how to make it faster for what I think is a relatively common usecase, which is its vectorized form. Broadcasting the quadgk call itself performs as we’d expect (slightly slower than two times a single call), whereas if broadcasting the function that we are integrating (so that the integrand is vector valued) results in a slowdown.

In this specific case, there should be speedups available because although the integrand is vector valued it is basically copies of similar integrals which differ only by the parameter a (so that it seems like quadgk could reuse some computations).

julia> using QuadGK, BenchmarkTools

julia> f(x,a) = exp(-a * x^2)
f (generic function with 1 method)

julia> quad1(a) = quadgk(x ->f(x,a), 0, 1, rtol=1e-12)
quad1 (generic function with 1 method)

julia> quadvec(a) = quadgk(x ->f.(x,a), 0, 1, rtol=1e-12)
quadvec (generic function with 1 method)

julia> avec = [0.5; 1.0 ;1.5]
3-element Array{Float64,1}:
 0.5
 1.0
 1.5

julia> @btime quad1($1.0)
  976.333 ns (39 allocations: 992 bytes)
(0.746824132812427, 0.0)

julia> # broadcasting quadgk
       @btime quad1.($avec)
  2.329 μs (87 allocations: 2.25 KiB)
3-element Array{Tuple{Float64,Float64},1}:
 (0.8556243918921488, 2.55351295663786e-15)
 (0.746824132812427, 0.0)
 (0.6633509458403348, 1.1102230246251565e-16)

julia> # broadcasting f
       @btime quadvec($avec)
  5.054 μs (157 allocations: 16.36 KiB)
([0.8556243918921488, 0.746824132812427, 0.6633509458403348], 1.3401577416544657e-16)

Hmm ok I guess that it does just fine! It just depends on how many integrals it needs to compute.

julia> avecB = rand(1000)
1000-element Array{Float64,1}:
 0.9109829394262448
 0.7401979941321699
 0.004438422199036562
 0.7622184357652708
 0.3752119597124437
 0.0870799867237182
 0.7612865115744936
 0.8395578809491078
 ⋮
 0.48755227092235875
 0.06770987500160586
 0.029118998270370566
 0.8758783271579778
 0.1163149703976627
 0.9930090806821483
 0.09210918209385932

julia> # broadcasting quadgk
       @btime quad1.($avecB)
  217.334 μs (7171 allocations: 191.77 KiB)


julia> # broadcasting quadgk
       @btime quadvec($avecB)
  106.152 μs (50 allocations: 373.16 KiB)

Would still appreciate suggestions for improvements here, but less necessary.

If you want this to be fast for small vectors a you should make a into an SVector (from the StaticArrays package), so that all the loops are unrolled and you aren’t allocating zillions of tiny arrays on the heap.

(If a is a large vector, the overhead of heap-allocating temporary arrays etcetera become negligible.)

2 Likes

Great! That completely switches the conclusion for small n and even shaves off some time from the n=1000 case. I don’t know yet what my problem size will be, but will have to keep this in mind.