x - c[i]
allocates a temporary 3-component array in your inner loop, which is going to hurt performance a lot. As @lmiq suggested, 3-component coordinate vectors are a perfect case for SVector
from the StaticArrays package.
Also, note that gaussian_kernel
is a non-const
global (as opposed to defining it in the usual way via function gaussian_kernel(...)
.
Finally, use @btime
from the BenchmarkTools package for more accurate timing, and use $
to interpolate global variables when benchmarking.
The huge number of allocations here still indicates a problem. Possibly the type instability from the non-const
definition of gaussian_kernel
. Also, did you forget to use SVector
for grid
?
(If your innermost loops are allocating memory on the heap, you are leaving a lot of performance on the table. You wouldn’t call malloc
in your innermost loop in a C program.)