I try to optimize some code which calls quadgk many times on a matrix. A MWE is
using SpecialFunctions
using QuadGK
Lsum = 2
v_arr = zeros(100,Lsum+1);
alpha_arr = rand(100);
gamma_dict = Dict{Float64,Float64}();
function fillgamma(gamma_dict,Lsum)
for i = 0:0.5:Lsum+1.5
gamma_dict[i] = gamma(i)
end
return gamma_dict
end
function vrad_fun(r)
return exp(-r^2)
end
function precompute_varr(v_arr,alpha_arr,Lsum,gamma_dict,vrad_fun)
for n = 0:Lsum
for k=1:lastindex(alpha_arr)
v_arr[k,n+1] = vrad_integration(vrad_fun,alpha_arr[k],n)
end
end
return v_arr
end
function vrad_integration(vrad_fun,alpha,n)
integrand(r) = vrad_fun(r)*r^(2*n+2)*exp(-alpha*r^2)
val = quadgk(r -> integrand(r),0,Inf)[1]
return val
end
fillgamma(gamma_dict,Lsum)
precompute_varr(v_arr,alpha_arr,Lsum,gamma_dict,vrad_fun)
#@btime precompute_varr(v_arr,alpha_arr,Lsum,gamma_dict,vrad_fun);
1.316 ms (2700 allocations: 164.06 KiB)
Benchmarking this yields quite a lot of allocations, and I wonder if I can reduce them somehow. I tried to use some broadcasting instead of the double for-loop, but this didnt really improve anything, and to be honest it was more trial and error without proper understanding. In reality, the matrix dimensions might be ~10000 x Lsum+1, where Lsum usually is below 10. Also, I tried to use SVector from the StaticArray package, but didnt get proper benefit from it. Maybe I didnt use it appropriately?
I would be happy about any suggestions.
EDIT: I removed a normalization prefactor within the function precompute_varr to reduce the problem to its core.
Thank you for your remark on the in-place operations. I tried to employ it as suggested by the documentation (For simplicity, I slightly changed the calculation procedure and removed the norm_prefactor which was unnecessary. I changed the function definition in the original post accordingly, see EDIT):
# Try 2:
function precompute_varr2(v_arr,alpha_arr,Lsum,gamma_dict,vrad_fun)
n_arr = 0:Lsum
vrad_integration2(v_arr,vrad_fun,alpha_arr,n_arr)
return v_arr
end
function vrad_integration2(v_arr,vrad_fun,alpha_arr,n_arr)
integrand2(r) = vrad_fun(r)*r^(2*n+2)*exp(-alpha*r^2)
integrand2!(y,r) = y .= integrand2(r)
quadgk!(r -> integrand2!(y,r),v_arr,0,Inf)[1]
end
v_arr2 = zeros(100,Lsum+1);
precompute_varr2(v_arr2,alpha_arr,Lsum,gamma_dict,vrad_fun)
The error message is a bit unclear to me, but I understand that the problem lies in the definition of integrand2!. However, I copied the notation 1:1 from the documentation, so I’m not sure how to change the definition. Unfortunately, no example is provided there.
I’m not sure what SVector has to do with it, since your integrand seems to be a scalar?
quadgk internally does a bunch of allocations to build a heap of integration segments that it adaptively refines. You can re-use a pre-allocated segment buffer by creating one with the alloc_segbuf function and passing it to the segbuf argument of quadgk.
(You can also experiment with other quadrature schemes. One option might be to do a change of variables u =\sqrt{\alpha} r and then use Gauss–Hermite quadrature or similar. Or \sqrt{\alpha + 1} if you include the exponential in your vrad_fun, though maybe that is just a toy example — even with quadgk, a change of variables might be advisable, as otherwise it could be quite inefficient for \alpha very large … see also the discussion in the manual on infinite limits. Another option might be DoubleExponentialFormulas.jl)
You can do this, but there is also a tradeoff if you try to express the whole array as an integral of an array-valued function (rather than one integral per element). If you try to do the whole array at once, the amount of refinement needed could be dominated by a few badly behaved elements, so you pay the price for the whole array rather than just for those elements. And there are also more allocations required to store intermediate results. So you aren’t guaranteed to be faster just because it is “vectorized”.
Your original code produced the following @btime output on my machine:
1.220 ms (2700 allocations: 164.06 KiB)
I removed the gamma_dict stuff since it wasn’t involved in your calculation, added a few other modifications aimed at performance, and followed @stevengj’s advice regarding the use of alloc_segbuf. The result is the following code:
using SpecialFunctions
using QuadGK
using BenchmarkTools
Lsum = 2
v_arr = zeros(100, Lsum+1)
alpha_arr = rand(100)
function vrad_fun2(r²)
return exp(-r²)
end
function precompute_varr!(v_arr, alpha_arr, Lsum)
buf = alloc_segbuf()
for n = 0:Lsum
for k=1:lastindex(alpha_arr)
v_arr[k,n+1] = vrad_integration(alpha_arr[k], n, buf)
end
end
return v_arr
end
function integrand(r, alpha, n)
r² = r^2
vrad_fun2(r²) * r²^(n+1) * exp(-alpha * r²)
end
function vrad_integration(alpha, n, buf)
val = quadgk(r -> integrand(r, alpha, n), 0, Inf; segbuf=buf)[1]
return val
end
precompute_varr!(v_arr, alpha_arr, Lsum)
@btime precompute_varr!($v_arr, $alpha_arr, $Lsum)
Thank you for the references, I will read further. Indeed, the exact form of vrad_fun was a toy example, as I can solve the integral analytically for that case. However, the exp(-alpha r^2) term in the integrand should always ensure that the integrand is faling off at large distance and is integrable. I will have a look and see whether the change of variables, or the DoubleExponentialFormulas might be beneficial for future use-cases.
Thank you, I guess I needed a bit of a helping hand on how to implement the inplace-function. Also great that that you also directly incorporated the hint about alloc_segbuff. I will accept this as solution and study your example in more detail now.
I played around more with the solution suggested by @PeterSimon and figured out that many of the allocations in the original code actually stem from passing around the function vrad_fun as argument, and not from quadgk itself. In the example provided here, it is not obvious why this would be necessary, but in my actual use-case I need vrad_fun to be passable as argument and not defined in global scope.
Reading more about it in the manual and in the thread Memory allocation with function passed as arguments, I realized that I need to ensure specialization for the function type by some where{V} (see code below). With this, I managed to get the same amount of allocations as in PeterSimon’s reply, but keeping the vrad_fun as function argument. For completeness, and the small chance it might help someone in the future, I post my solution:
using SpecialFunctions
using QuadGK
using BenchmarkTools
Lsum = 2
v_arr = zeros(100,Lsum+1)
alpha_arr = rand(100)
function vrad_fun(r)
return exp(-r^2)
end
function precompute_varr!(v_arr,alpha_arr,Lsum,vrad_fun::V) where {V}
buf = alloc_segbuf()
for n = 0:Lsum
for k=1:lastindex(alpha_arr)
v_arr[k,n+1] = vrad_integration2(vrad_fun,alpha_arr[k],n,buf)
end
end
return v_arr
end
function integrand(r,alpha,n,vrad_fun)
return vrad_fun(r)*r^(2*n+2)*exp(-alpha*r^2)
end
function vrad_integration2(vrad_fun,alpha,n,buf) #where {V}
val = quadgk(r -> integrand(r,alpha,n,vrad_fun),0,Inf;segbuf=buf)[1]
end
v_arr2 = zeros(100,Lsum+1);
@btime precompute_varr!($v_arr2,$alpha_arr,$Lsum,vrad_fun)
# 1.001 ms (2 allocations: 368 bytes)