Many thanks! Using your new approach on my machine I’m now getting ~ 2.4 seconds for the calculations. As you said improvements are due to inlining and using views. At the moment, I’m not seeing a difference when substituting @inbounds @simd
for @avx
. Also passing function kf::F
instead of K::AbstractKernel
doesn’t change the performance - even though at the moment your version has hard coded dotkernel
in your kernelfn
function - it doesn’t see to change the performance - probably be cause the compiler takes care of it.