Looping over subset of array indices: Julia vs Python

snowgum · April 21, 2020, 1:26pm

I am new to Julia and currently trying to emulate some practices I’m used to in Python. One problem I am struggling with is how to best iterate over some of the indices of an array. For example, consider an array with size (nx, ny, nz). I want to apply a function that works on 1D arrays, so I do a loop over nx and ny to work on a subset of the array, say [i, j, :]. What is the best way to achieve this?

Here is an example for some code with a cumulative sum function, where I find my best Julia code is about twice as slow as Python. I can’t quite figure out why exactly Python is doing better here. Here’s the Python code and timings, for reference:

import numpy as np
from scipy.integrate.quadrature import cumtrapz 

(NX, NY, NZ) = (256, 256, 75)
z = np.linspace(0., 1.e3, NZ)
M = np.ones((NX, NY, NZ), dtype='f8')
out = np.empty_like(M)

%timeit cumtrapz(M[100, 100], x=z)

12.4 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit cumtrapz(M, x=z, axis=2)

44.3 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

With Julia, I’m using NumericalIntegration.jl which provides a function called cumul_integrate that works similarly to cumtrapz. Here is my first try with Julia:

import NumericalIntegration: integrate, cumul_integrate, TrapezoidalFast
using BenchmarkTools

(NX, NY, NZ) = (256, 256, 75)
z = LinRange(0., 1.e3, NZ)
M = ones(Float64, (NX, NY, NZ))
out = similar(M);

@btime cumul_integrate(z, M[100, 100, :], TrapezoidalFast());

871.545 ns (6 allocations: 3.63 KiB)

@btime begin
    for j=1:256, i=1:256
        out[i, j, :] = cumul_integrate(z, M[i, j, :], TrapezoidalFast())
    end
end

127.190 ms (458752 allocations: 234.00 MiB)

You can see that in the case of only one column, the ~871 ns time is quite a bit faster than Python (12.4 µs), which clearly has more overheads. However, the Julia loop time does not scale as the number of runs of the loop. In this case, 256*256 * 871 ns should be around 57 ms, but the result is about twice as bad (and almost three times slower than Python).

Experimenting a bit, I found a more efficient way in Julia by transposing the original array and using selectdim. This is the fastest I was able to do:

M = ones(Float64, (75, 256, 256))
out = similar(M);

@btime begin
    for j=1:256, i=1:256
        @inbounds out[:, i, j] = cumul_integrate(z, selectdim(selectdim(M, 2, i), 2, j), TrapezoidalFast())
    end
end

85.986 ms (786432 allocations: 201.00 MiB)

Unfortunately, it is still about twice as slow as Python. Am I doing something obviously wrong here?

Any tips on how to better run loops in Julia calling functions with lower dimensionality would be very welcome. In this case, the cumulative sum function is easy enough to code from scratch, but in the general case this is not always possible/desirable.

Thanks!

rdeits · April 21, 2020, 1:39pm

The Julia performance tips from the manual have a lot of really useful suggestions. In fact, they already cover all three suggestions I would have made:

Avoid global variables: Performance Tips · The Julia Language
Use views for your slices: Performance Tips · The Julia Language (this is what numpy does by default, which is one important difference between your Julia and Python code.
Access arrays in memory order: Performance Tips · The Julia Language (you’ve already discovered this when you found out that accessing out[:, i, j] was faster than out[i, j, :]).

DNF · April 21, 2020, 2:30pm

What is the purpose of using selectdim instead of ordinary indexing?

snowgum · April 21, 2020, 2:35pm

I now realise selectdim was a silly way to get a view. Changing that bit to instead view(M, :, i, j) does a little better - about 18% faster.

DNF · April 21, 2020, 2:47pm

Did you try wrapping the loop in a function and then passing your input and output arrays as arguments to that function?

snowgum · April 21, 2020, 3:10pm

Thanks for the tip! Indeed putting it into a function makes it about 3x faster, or almost 5x faster than my first naive attempt. So it appears the global variables was the key factor here.

By the way, are there more elegant ways to iterate over sub-indices of an array instead of the explicit double loop I did in the example? eachindex() is the obvious candidate, but it reduces to single elements in the array, while I need a slice (1D or otherwise). In this case, I replaced the following:

for j=1:256, i=1:256
    out[:, i, j] = do_something(view(M, :, i, j))
end

using CartesianIndices in this way:

for J in CartesianIndices(M[1, :, :])
    out[:, J] = do_something(view(M, :, J))
end

It seems to have a slight performance penalty over the first loop. Is there a better way to do this?

rdeits · April 21, 2020, 3:29pm

Remember that M[1, :, :] creates a copy of that part of the array, so you’re creating a big copy of an array just to construct a set of indices (which should be trivial to do just from the dimensions of the array).

Instead, you can use the fact that CartesianIndices also happily accepts a tuple of sizes (see ?CartesianIndices for more):

julia> M = rand(2, 2, 2)
2×2×2 Array{Float64,3}:
[:, :, 1] =
 0.0337042  0.245066
 0.348989   0.587323

[:, :, 2] =
 0.514845  0.772325
 0.297874  0.709676

julia> CartesianIndices(M[1, :, :])
2×2 CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}:
 CartesianIndex(1, 1)  CartesianIndex(1, 2)
 CartesianIndex(2, 1)  CartesianIndex(2, 2)

julia> CartesianIndices((size(M, 2), size(M, 3)))
2×2 CartesianIndices{2,Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}}:
 CartesianIndex(1, 1)  CartesianIndex(1, 2)
 CartesianIndex(2, 1)  CartesianIndex(2, 2)

DNF · April 21, 2020, 3:32pm

Not at my computer, but doesn’t size(M, (2,3)) work?

Edit: Apparently not. Odd, since Matlab has that, and it’s normally the other way around

rdeits · April 21, 2020, 3:48pm

That sounds like a job for broadcasting!

julia> size.(Ref(M), (2, 3))
(2, 2)

snowgum · April 21, 2020, 3:57pm

Thank you for the suggestions! I do realise this can be done from the size of the array, just trying to find out if there was a simpler way to do it (doesn’t look like it).

@rdeits Yes, I understand that M[1, :, :1] creates a copy. In my example the performance penalty was minimal, but it is a sloppy way to go around it. Seems like CartesianIndices(view(M, 1, :, :)) is the way to go, at least the way with fewer characters.

yangcht · June 7, 2022, 6:55pm

This is beautiful! Thank you!

Topic		Replies	Views
Looping over rows and columns: what am I doing wrong? Performance	4	1105	February 17, 2020
Question about performance when iterating New to Julia	3	443	March 18, 2020
Faster array indexing with a logical array New to Julia array	12	1584	September 28, 2020
How to improve performance of sum() Performance question , fast-math , tullio , loopvectorization , sum	19	4801	July 19, 2021
Looping over float-range is faster than looping over int-range? Performance	9	1808	August 19, 2018

Looping over subset of array indices: Julia vs Python

Related topics