The Why is not so easy to explain, for me itβs always some lengthy analysis of the lowered code.
In general loops in Julia are fast. Both your variants will boil down at some point to a loop, but the difference is how much the compiler can create fast code from the given Julia code. (My understanding here is not good enough to be more than vague).
What I do when I want to know what happens I use @code_lowered:
julia> @code_lowered fill1(A,cartId,value)
CodeInfo(
1 β %1 = Main.length(cartId)
β %2 = 1:%1
β @_5 = Base.iterate(%2)
β %4 = @_5 === nothing
β %5 = Base.not_int(%4)
βββ goto #4 if not %5
2 β %7 = @_5
β i = Core.getfield(%7, 1)
β %9 = Core.getfield(%7, 2)
β %10 = Base.getindex(value, i)
β %11 = Base.getindex(cartId, i)
β Base.setindex!(A, %10, %11)
β @_5 = Base.iterate(%2, %9)
β %14 = @_5 === nothing
β %15 = Base.not_int(%14)
βββ goto #4 if not %15
3 β goto #2
4 β return nothing
)
julia> @code_lowered fill2(A,cartId,value)
CodeInfo(
1 β Base.setindex!(A, value, cartId)
βββ return value
)
Well, fill2
looks much shorter, but:
julia> @code_lowered Base.setindex!(A, value, cartId)
CodeInfo(
1 β nothing
β %2 = Base.IndexStyle(A)
β %3 = Core.tuple(%2, A)
β Core._apply_iterate(Base.iterate, Base.error_if_canonical_setindex, %3, I)
β %5 = Base.IndexStyle(A)
β %6 = Core.tuple(%5, A, v)
β %7 = Base.to_indices(A, I)
β %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
βββ return %8
)
At least , we can see, that it is a loop, because of
β %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
Sounds like a loop, and is a loop.
You can go further by setting some temporary variables in the REPL according to this and see to what @code_lowered Core._apply_iterate ...
will be lowered.
Itβs a bit tedious, so I skip this. But what fill1
tells you is, that it checks for indices to be correct. As we already know this we can switch this off to squeeze a bit more of performance from it:
function fill1b(A,cartId,value)
@inbounds for i = 1:length(cartId)
A[cartId[i]] = value[i]
end
end
julia> @benchmark fill1($A,$cartId,$value)
BechmarkTools.Trial: 10000 samples with 972 evaluations.
Range (min β¦ max): 87.140 ns β¦ 125.926 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 91.770 ns β GC (median): 0.00%
Time (mean Β± Ο): 91.517 ns Β± 2.401 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β
β
βββ ββ ββββββββββββ
ββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ
ββββββ β
87.1 ns Histogram: log(frequency) by time 99.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark fill1b($A,$cartId,$value)
BechmarkTools.Trial: 10000 samples with 987 evaluations.
Range (min β¦ max): 60.689 ns β¦ 101.013 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 66.160 ns β GC (median): 0.00%
Time (mean Β± Ο): 66.188 ns Β± 2.332 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β β
ββββββββββ
βββββββ
ββ
ββββ
βββββ
β
β
ββββββββββββββββββββββββββββββ β
60.7 ns Histogram: frequency by time 73.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
This only to distract you from the question why fill1
is faster than fill2
.
Performance questions typically attract many people, so lets just wait for the experts.