How much can hard-coded indexing be avoided in Julia?

What I’m asking, in other words, is if there any way to manipulate the indices in any of the three ways that @Henrique_Becker mentioned. And well, of course without having to explicitly code the loop in a way amenable to column-major order (to repeat the title of the thread: how much can hard-coded indexing be avoided in Julia for this example?).

Okay, I see what you’re saying.

I believe you could do that using pairs, which will produce an indexing variable and the element itself from a collection.

EDIT: see this example

[println(i,"   ",x) for (i,x) in pairs(collect(Iterators.product(1:10,20:30)))];
1 Like

I see, is there anything better than this?

A = Array{Float64}(undef, 10, 20, 30)
for (idx, val) in pairs(A)
    i = idx.I[1]
    j = idx.I[2]
    k = idx.I[3]
    A[idx] = f(i, g(j, k))
end

(does it make any difference if instead of having A[idx], I have A[i, j, k]?)

julia> for idx in CartesianIndices(A)
           i,j,k = Tuple(idx)
           A[idx] = f(i, g(j,k))
       end

also an option?

Hmm, I’m doing some timing and nothing yet beats option B below:

f(x, y) = x*y
g(x, y) = x*y

A = Array{Float64}(undef, 100, 200, 300)

# option A
@time for i in 1:100, j in 1:200, k in 1:300
    A[i, j, k] = f(i, g(j, k))
end
0.242069 seconds (5.99 M allocations: 91.385 MiB)

# option B
@time for k in 1:300, j in 1:200, i in 1:100
    A[i, j, k] = f(i, g(j, k))
end
0.211542 seconds (5.99 M allocations: 91.385 MiB, 16.37% gc time)

# option C
@time for (idx, _) in pairs(A)
    i = idx.I[1]
    j = idx.I[2]
    k = idx.I[3]
    A[idx] = f(i, g(j, k))
end
8.630138 seconds (131.72 M allocations: 4.109 GiB, 7.43% gc time)

# option D
@time for (idx, _) in pairs(A)
    i = idx.I[1]
    j = idx.I[2]
    k = idx.I[3]
    A[i, j, k] = f(i, g(j, k))
end
8.227987 seconds (131.72 M allocations: 4.109 GiB, 6.67% gc time)

# option E
@time for idx in CartesianIndices(A)
    i, j, k = Tuple(idx)
    A[idx] = f(i, g(j, k))
end
1.616649 seconds (35.72 M allocations: 1.069 GiB, 8.43% gc time)

Of course it’s probably impossible to beat careful hard-coded indexing, but I’m surprised by how much slower the implicit indexing options are.

You are breaking performance tips numbers 1:n.:wink:

Please wrap your code in functions, avoid non-const globals, don’t include compilation time, and don’t use @time for microbenchmarks. Use BenchmarkTools.jl.

5 Likes

I would expect all the alternatives that have correct loop order to perform identically (it might actually be easier to safely remove bounds checks for eachindex/pairs/CartesianIndices) and none of them should have any allocations or GC at all. You can tell from the tens to hundreds of millions of allocations that there is something very wrong with the benchmarking.

2 Likes

Thanks for pointing that out. Let’s try it again:

using BenchmarkTools

f(x, y) = x*y
g(x, y) = x*y
const A = Array{Float64}(undef, 100, 200, 300)

# option A
function optA(A)
    for i in 1:100, j in 1:200, k in 1:300
        A[i, j, k] = f(i, g(j, k))
    end
end

# option B
function optB(A)
    for k in 1:300, j in 1:200, i in 1:100
        A[i, j, k] = f(i, g(j, k))
    end
end

# option C
function optC(A)
    for (idx, _) in pairs(A)
        i, j, k = Tuple(idx)
        A[idx] = f(i, g(j, k))
    end
end

# option D
function optD(A)
    for (idx, _) in pairs(A)
        i, j, k = Tuple(idx)
        A[i, j, k] = f(i, g(j, k))
    end
end

# option E
function optE(A)
    for idx in CartesianIndices(A)
        i, j, k = Tuple(idx)
        A[idx] = f(i, g(j, k))
    end
end
@benchmark optA(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     19.437 ms (0.00% GC)
  median time:      23.235 ms (0.00% GC)
  mean time:        22.935 ms (0.00% GC)
  maximum time:     28.840 ms (0.00% GC)
  --------------
  samples:          218
  evals/sample:     1
@benchmark optB(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.404 ms (0.00% GC)
  median time:      4.742 ms (0.00% GC)
  mean time:        4.897 ms (0.00% GC)
  maximum time:     8.086 ms (0.00% GC)
  --------------
  samples:          1021
  evals/sample:     1
@benchmark optC(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.557 ms (0.00% GC)
  median time:      13.302 ms (0.00% GC)
  mean time:        13.348 ms (0.00% GC)
  maximum time:     17.289 ms (0.00% GC)
  --------------
  samples:          375
  evals/sample:     1
@benchmark optD(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.493 ms (0.00% GC)
  median time:      13.546 ms (0.00% GC)
  mean time:        13.587 ms (0.00% GC)
  maximum time:     16.432 ms (0.00% GC)
  --------------
  samples:          368
  evals/sample:     1
@benchmark optE(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.355 ms (0.00% GC)
  median time:      4.733 ms (0.00% GC)
  mean time:        4.851 ms (0.00% GC)
  maximum time:     7.987 ms (0.00% GC)
  --------------
  samples:          1031
  evals/sample:     1

So now options B and E are the most performant. Option E is the most amenable to human sanity.

I think this should be written (i, j, k) = Tuple(idx), like you did further down. Shouldn’t impact performance.

Maybe add @inbounds to see if it matters.

1 Like

Now nothing beats option E with @inbounds

@benchmark optA(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     18.172 ms (0.00% GC)
#   median time:      20.093 ms (0.00% GC)
#   mean time:        20.238 ms (0.00% GC)
#   maximum time:     24.194 ms (0.00% GC)
#   --------------
#   samples:          248
#   evals/sample:     1

@benchmark optAib(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     20.944 ms (0.00% GC)
#   median time:      21.362 ms (0.00% GC)
#   mean time:        21.542 ms (0.00% GC)
#   maximum time:     24.136 ms (0.00% GC)
#   --------------
#   samples:          233
#   evals/sample:     1

@benchmark optB(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     4.528 ms (0.00% GC)
#   median time:      5.371 ms (0.00% GC)
#   mean time:        5.395 ms (0.00% GC)
#   maximum time:     8.741 ms (0.00% GC)
#   --------------
#   samples:          927
#   evals/sample:     1

@benchmark optBib(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     3.606 ms (0.00% GC)
#   median time:      3.724 ms (0.00% GC)
#   mean time:        3.789 ms (0.00% GC)
#   maximum time:     5.126 ms (0.00% GC)
#   --------------
#   samples:          1320
#   evals/sample:     1

@benchmark optC(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     12.782 ms (0.00% GC)
#   median time:      13.540 ms (0.00% GC)
#   mean time:        13.619 ms (0.00% GC)
#   maximum time:     16.399 ms (0.00% GC)
#   --------------
#   samples:          368
#   evals/sample:     1

@benchmark optCib(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     2.949 ms (0.00% GC)
#   median time:      3.240 ms (0.00% GC)
#   mean time:        3.322 ms (0.00% GC)
#   maximum time:     6.005 ms (0.00% GC)
#   --------------
#   samples:          1505
#   evals/sample:     1

@benchmark optD(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     11.209 ms (0.00% GC)
#   median time:      11.960 ms (0.00% GC)
#   mean time:        12.088 ms (0.00% GC)
#   maximum time:     18.776 ms (0.00% GC)
#   --------------
#   samples:          414
#   evals/sample:     1

@benchmark optDib(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     2.452 ms (0.00% GC)
#   median time:      2.898 ms (0.00% GC)
#   mean time:        2.919 ms (0.00% GC)
#   maximum time:     4.484 ms (0.00% GC)
#   --------------
#   samples:          1713
#   evals/sample:     1

@benchmark optE(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     4.383 ms (0.00% GC)
#   median time:      4.899 ms (0.00% GC)
#   mean time:        4.963 ms (0.00% GC)
#   maximum time:     7.618 ms (0.00% GC)
#   --------------
#   samples:          1008
#   evals/sample:     1

@benchmark optEib(A)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     2.374 ms (0.00% GC)
#   median time:      2.434 ms (0.00% GC)
#   mean time:        2.515 ms (0.00% GC)
#   maximum time:     4.093 ms (0.00% GC)
#   --------------
#   samples:          1987
#   evals/sample:     1

At the risk of a digression, is there a way to activate/deactivate bounds checking as a startup option for Julia? I’m thinking that I would want bounds checking when actively coding, but deactivating it for running the .jl file, without having to manually use @inbounds in all the loops. Or, use @inbounds in all the loops, but having an option at startup to ignore that.

Did you try julia --help? :slightly_smiling_face:

3 Likes

there is such a thing? :exploding_head:
thank you!

If I can make a suggestion: julia --help indicates explicit macro names for inline and math-mode, but not for check-bounds. Perhaps it would be useful to explicitly indicate that --check-bounds overrides @inbounds declarations?

from julia --help :

--check-bounds={yes|no}   Emit bounds checks always or never (ignoring declarations)

looks explicit.

Compare it with the entries for inline and math-mode:

--inline={yes|no}         Control whether inlining is permitted, including overriding @inline declarations
--math-mode={ieee,fast}   Disallow or enable unsafe floating point optimizations (overrides @fastmath declaration)

How to make the declaration for bounds checks is not explicit, IMO. At least is not consistent compared with the other two.

3 Likes

fair enough.

This is valid point, can you make a PR? One should just change this line julia/jloptions.c at master · JuliaLang/julia · GitHub I think it is a good contribution and not very complicated, so it should be accepted reasonably fast.

3 Likes

I would be a bit wary of removing bounds checks globally. If there is a bug in your indexing, it could suddenly start reading outside the array, crash your program or do other things.

@inbounds is for when you are very certain that no indexing can go wrong, and even then, the compiler should often disable them for you, automatically.

On my laptop, there’s no significant difference with and without inbounds if I do

function bar!(A)
    for ind in CartesianIndices(A)
        (i, j, k) = Tuple(ind)
        A[ind] = f(i, g(j, k))  # using ind directly instead of (i,j,k)
    end
end

I prefer being explicit about @inbounds or try to trust the compiler to disable them for me in the right places.

Done! I can’t believe I’m contributing to core Julia :partying_face:

9 Likes

I see. So you use explicit @inbounds or none at all, but never start Julia with --check-bounds=no? How come I see a large difference between functions optE and optEib?

function optE(A)
    for idx in CartesianIndices(A)
        i, j, k = Tuple(idx)
        A[idx] = f(i, g(j, k))
    end
end

function optEib(A)
    for idx in CartesianIndices(A)
        i, j, k = Tuple(idx)
        @inbounds A[idx] = f(i, g(j, k))
    end
end

julia> @benchmark optE(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.376 ms (0.00% GC)
  median time:      4.516 ms (0.00% GC)
  mean time:        4.578 ms (0.00% GC)
  maximum time:     6.253 ms (0.00% GC)
  --------------
  samples:          1092
  evals/sample:     1

julia> @benchmark optEib(A)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.946 ms (0.00% GC)
  median time:      3.239 ms (0.00% GC)
  mean time:        3.259 ms (0.00% GC)
  maximum time:     4.865 ms (0.00% GC)
  --------------
  samples:          1534
  evals/sample:     1