I’ll start off by saying that QuadGK is really fast. But I want to know how to make it faster for what I think is a relatively common usecase, which is its vectorized form. Broadcasting the `quadgk` call itself performs as we’d expect (slightly slower than two times a single call), whereas if broadcasting the function that we are integrating (so that the integrand is vector valued) results in a slowdown.

In this specific case, there should be speedups available because although the integrand is vector valued it is basically copies of similar integrals which differ only by the parameter `a` (so that it seems like `quadgk` could reuse some computations).

``````julia> using QuadGK, BenchmarkTools

julia> f(x,a) = exp(-a * x^2)
f (generic function with 1 method)

quad1 (generic function with 1 method)

quadvec (generic function with 1 method)

julia> avec = [0.5; 1.0 ;1.5]
3-element Array{Float64,1}:
0.5
1.0
1.5

976.333 ns (39 allocations: 992 bytes)
(0.746824132812427, 0.0)

2.329 μs (87 allocations: 2.25 KiB)
3-element Array{Tuple{Float64,Float64},1}:
(0.8556243918921488, 2.55351295663786e-15)
(0.746824132812427, 0.0)
(0.6633509458403348, 1.1102230246251565e-16)

5.054 μs (157 allocations: 16.36 KiB)
([0.8556243918921488, 0.746824132812427, 0.6633509458403348], 1.3401577416544657e-16)

``````

Hmm ok I guess that it does just fine! It just depends on how many integrals it needs to compute.

``````julia> avecB = rand(1000)
1000-element Array{Float64,1}:
0.9109829394262448
0.7401979941321699
0.004438422199036562
0.7622184357652708
0.3752119597124437
0.0870799867237182
0.7612865115744936
0.8395578809491078
⋮
0.48755227092235875
0.06770987500160586
0.029118998270370566
0.8758783271579778
0.1163149703976627
0.9930090806821483
0.09210918209385932

217.334 μs (7171 allocations: 191.77 KiB)

If you want this to be fast for small vectors `a` you should make `a` into an `SVector` (from the StaticArrays package), so that all the loops are unrolled and you aren’t allocating zillions of tiny arrays on the heap.
(If `a` is a large vector, the overhead of heap-allocating temporary arrays etcetera become negligible.)
Great! That completely switches the conclusion for small `n` and even shaves off some time from the `n=1000` case. I don’t know yet what my problem size will be, but will have to keep this in mind.