I am new to Julia and currently trying to emulate some practices I’m used to in Python. One problem I am struggling with is how to best iterate over some of the indices of an array. For example, consider an array with size
(nx, ny, nz). I want to apply a function that works on 1D arrays, so I do a loop over
ny to work on a subset of the array, say
[i, j, :]. What is the best way to achieve this?
Here is an example for some code with a cumulative sum function, where I find my best Julia code is about twice as slow as Python. I can’t quite figure out why exactly Python is doing better here. Here’s the Python code and timings, for reference:
import numpy as np from scipy.integrate.quadrature import cumtrapz (NX, NY, NZ) = (256, 256, 75) z = np.linspace(0., 1.e3, NZ) M = np.ones((NX, NY, NZ), dtype='f8') out = np.empty_like(M)
%timeit cumtrapz(M[100, 100], x=z)
12.4 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit cumtrapz(M, x=z, axis=2)
44.3 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With Julia, I’m using
NumericalIntegration.jl which provides a function called
cumul_integrate that works similarly to
cumtrapz. Here is my first try with Julia:
import NumericalIntegration: integrate, cumul_integrate, TrapezoidalFast using BenchmarkTools (NX, NY, NZ) = (256, 256, 75) z = LinRange(0., 1.e3, NZ) M = ones(Float64, (NX, NY, NZ)) out = similar(M);
@btime cumul_integrate(z, M[100, 100, :], TrapezoidalFast());
871.545 ns (6 allocations: 3.63 KiB)
@btime begin for j=1:256, i=1:256 out[i, j, :] = cumul_integrate(z, M[i, j, :], TrapezoidalFast()) end end
127.190 ms (458752 allocations: 234.00 MiB)
You can see that in the case of only one column, the ~871 ns time is quite a bit faster than Python (12.4 µs), which clearly has more overheads. However, the Julia loop time does not scale as the number of runs of the loop. In this case, 256*256 * 871 ns should be around 57 ms, but the result is about twice as bad (and almost three times slower than Python).
Experimenting a bit, I found a more efficient way in Julia by transposing the original array and using
selectdim. This is the fastest I was able to do:
M = ones(Float64, (75, 256, 256)) out = similar(M);
@btime begin for j=1:256, i=1:256 @inbounds out[:, i, j] = cumul_integrate(z, selectdim(selectdim(M, 2, i), 2, j), TrapezoidalFast()) end end
85.986 ms (786432 allocations: 201.00 MiB)
Unfortunately, it is still about twice as slow as Python. Am I doing something obviously wrong here?
Any tips on how to better run loops in Julia calling functions with lower dimensionality would be very welcome. In this case, the cumulative sum function is easy enough to code from scratch, but in the general case this is not always possible/desirable.