I am new to Julia and currently trying to emulate some practices I’m used to in Python. One problem I am struggling with is how to best iterate over some of the indices of an array. For example, consider an array with size `(nx, ny, nz)`

. I want to apply a function that works on 1D arrays, so I do a loop over `nx`

and `ny`

to work on a subset of the array, say `[i, j, :]`

. What is the best way to achieve this?

Here is an example for some code with a cumulative sum function, where I find my best Julia code is about twice as slow as Python. I can’t quite figure out why exactly Python is doing better here. Here’s the Python code and timings, for reference:

```
import numpy as np
from scipy.integrate.quadrature import cumtrapz
(NX, NY, NZ) = (256, 256, 75)
z = np.linspace(0., 1.e3, NZ)
M = np.ones((NX, NY, NZ), dtype='f8')
out = np.empty_like(M)
```

```
%timeit cumtrapz(M[100, 100], x=z)
```

12.4 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

```
%timeit cumtrapz(M, x=z, axis=2)
```

44.3 ms ± 2.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

With Julia, I’m using `NumericalIntegration.jl`

which provides a function called `cumul_integrate`

that works similarly to `cumtrapz`

. Here is my first try with Julia:

```
import NumericalIntegration: integrate, cumul_integrate, TrapezoidalFast
using BenchmarkTools
(NX, NY, NZ) = (256, 256, 75)
z = LinRange(0., 1.e3, NZ)
M = ones(Float64, (NX, NY, NZ))
out = similar(M);
```

```
@btime cumul_integrate(z, M[100, 100, :], TrapezoidalFast());
```

871.545 ns (6 allocations: 3.63 KiB)

```
@btime begin
for j=1:256, i=1:256
out[i, j, :] = cumul_integrate(z, M[i, j, :], TrapezoidalFast())
end
end
```

127.190 ms (458752 allocations: 234.00 MiB)

You can see that in the case of only one column, the ~871 ns time is quite a bit faster than Python (12.4 µs), which clearly has more overheads. However, the Julia loop time does not scale as the number of runs of the loop. In this case, 256*256 * 871 ns should be around 57 ms, but the result is about twice as bad (and almost three times slower than Python).

Experimenting a bit, I found a more efficient way in Julia by transposing the original array and using `selectdim`

. This is the fastest I was able to do:

```
M = ones(Float64, (75, 256, 256))
out = similar(M);
```

```
@btime begin
for j=1:256, i=1:256
@inbounds out[:, i, j] = cumul_integrate(z, selectdim(selectdim(M, 2, i), 2, j), TrapezoidalFast())
end
end
```

85.986 ms (786432 allocations: 201.00 MiB)

Unfortunately, it is still about twice as slow as Python. Am I doing something obviously wrong here?

Any tips on how to better run loops in Julia calling functions with lower dimensionality would be very welcome. In this case, the cumulative sum function is easy enough to code from scratch, but in the general case this is not always possible/desirable.

Thanks!