# Vector of indices vs. for loop to assign values in an array

Let the following initialisation be.

``````A = zeros(3,3) # 3x3 array of 0
cartId = CartesianIndices(A)[[1 2 5 7]] # cartesian indices at which some values have to be assigned in A
value = rand(length(cartId)) # the values to be assigned in A at location stored in cartId
``````
• Which one of the following code (#1 or #2) is faster/less ressource consuming ?

• How to know it ?

Code #1:

``````# 1
for i = 1:length(cartId)
A[cartId[i]] = value[i]
end
``````

Code #2:

``````# 2
A[cartId] = value
``````

To find out you typically benchmark functions which do the operations. I have modified your example a bit, so that benchmarking is better to interprete:
using BenchmarkTools

``````n=100
A = zeros(n,n)
cartId = CartesianIndices(A)[rand(1:n*n,n)]
value = rand(length(cartId))

function fill1(A,cartId,value)
for i = 1:length(cartId)
A[cartId[i]] = value[i]
end
end

function fill2(A,cartId,value)
A[cartId] = value
end

@benchmark fill1(\$A,\$cartId,\$value)

@benchmark fill2(\$A,\$cartId,\$value)
``````

The results look like:
for fill1:

``````BechmarkTools.Trial: 10000 samples with 961 evaluations.
Range (min β¦ max):  86.368 ns β¦ 118.002 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     92.560 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   92.904 ns Β±   2.752 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
86.4 ns         Histogram: frequency by time          102 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

for fill2:

``````BechmarkTools.Trial: 10000 samples with 867 evaluations.
Range (min β¦ max):  139.677 ns β¦ 212.803 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     142.330 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   143.855 ns Β±   4.422 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
140 ns           Histogram: frequency by time          157 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

So, the loop is quite a bit faster.

2 Likes

Thank you.

How to understand that loops are faster here ? How the process differs from using `fill1` to `fill2` ?

The Why is not so easy to explain, for me itβs always some lengthy analysis of the lowered code.

In general loops in Julia are fast. Both your variants will boil down at some point to a loop, but the difference is how much the compiler can create fast code from the given Julia code. (My understanding here is not good enough to be more than vague).

What I do when I want to know what happens I use @code_lowered:

``````julia> @code_lowered fill1(A,cartId,value)
CodeInfo(
1 β %1  = Main.length(cartId)
β   %2  = 1:%1
β         @_5 = Base.iterate(%2)
β   %4  = @_5 === nothing
β   %5  = Base.not_int(%4)
βββ       goto #4 if not %5
2 β %7  = @_5
β         i = Core.getfield(%7, 1)
β   %9  = Core.getfield(%7, 2)
β   %10 = Base.getindex(value, i)
β   %11 = Base.getindex(cartId, i)
β         Base.setindex!(A, %10, %11)
β         @_5 = Base.iterate(%2, %9)
β   %14 = @_5 === nothing
β   %15 = Base.not_int(%14)
βββ       goto #4 if not %15
3 β       goto #2
4 β       return nothing
)

julia> @code_lowered fill2(A,cartId,value)
CodeInfo(
1 β     Base.setindex!(A, value, cartId)
βββ     return value
)
``````

Well, `fill2` looks much shorter, but:

``````julia> @code_lowered Base.setindex!(A, value, cartId)

CodeInfo(
1 β      nothing
β   %2 = Base.IndexStyle(A)
β   %3 = Core.tuple(%2, A)
β        Core._apply_iterate(Base.iterate, Base.error_if_canonical_setindex, %3, I)
β   %5 = Base.IndexStyle(A)
β   %6 = Core.tuple(%5, A, v)
β   %7 = Base.to_indices(A, I)
β   %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
βββ      return %8
)
``````

At least , we can see, that it is a loop, because of

``````β   %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
``````

Sounds like a loop, and is a loop.
You can go further by setting some temporary variables in the REPL according to this and see to what `@code_lowered Core._apply_iterate ...` will be lowered.

Itβs a bit tedious, so I skip this. But what `fill1` tells you is, that it checks for indices to be correct. As we already know this we can switch this off to squeeze a bit more of performance from it:

``````function fill1b(A,cartId,value)
@inbounds for i = 1:length(cartId)
A[cartId[i]] = value[i]
end
end

julia> @benchmark fill1(\$A,\$cartId,\$value)
BechmarkTools.Trial: 10000 samples with 972 evaluations.
Range (min β¦ max):  87.140 ns β¦ 125.926 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     91.770 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   91.517 ns Β±   2.401 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β ββββ ββ ββββββββββββ ββββββββ                       β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
87.1 ns       Histogram: log(frequency) by time      99.3 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark fill1b(\$A,\$cartId,\$value)
BechmarkTools.Trial: 10000 samples with 987 evaluations.
Range (min β¦ max):  60.689 ns β¦ 101.013 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     66.160 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   66.188 ns Β±   2.332 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β        β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
60.7 ns         Histogram: frequency by time         73.3 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

This only to distract you from the question why `fill1` is faster than `fill2`.

Performance questions typically attract many people, so lets just wait for the experts.

3 Likes

What you are seeing is the fact that the with the loop the compiler is being able to prove that it wonβt access out of bounds areas, while the assignment is doing bounds checks. I was able to see that by checking the assembly code.
By adding an `@inbounds` in fill2 you can get better performance.

``````julia> function fill3(A,cartId,value)
@inbounds A[cartId] = value
end
julia> @benchmark fill3(\$A,\$cartId,\$value)
BenchmarkTools.Trial: 10000 samples with 978 evaluations.
Range (min β¦ max):  67.740 ns β¦ 108.896 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     68.124 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   68.275 ns Β±   1.277 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
67.7 ns         Histogram: frequency by time         72.1 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark fill1(\$A,\$cartId,\$value)
BenchmarkTools.Trial: 10000 samples with 972 evaluations.
Range (min β¦ max):  76.175 ns β¦ 123.414 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     76.903 ns               β GC (median):    0.00%
Time  (mean Β± Ο):   77.101 ns Β±   1.600 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
76.2 ns         Histogram: frequency by time           81 ns <

Memory estimate: 0 bytes, allocs estimate: 0.``````
3 Likes