# Performance of loops

Julia has several ways to define loops or repeat code, each with advantages and disadvantages. I am looking at the performance of some also inspired by previous posts on this forum.

``````#!/usr/bin/env julia

# https://discourse.julialang.org/t/map-vs-loops-array-comprehensions-in-julia-1-0/16137
# Note, this thread is from 2018.

import BenchmarkTools
import Random

function loop_with_map(x)
x = map(a -> 2 * a, x)
return x
end

function loop_with_map_dot(x)
x .= map(a -> 2 * a, x)     # Slowest
return x
end

function loop_list_comprehension(x)
# A list comprehension creates a new local scope and is therefore slow.
x = [ 2 * val for val in x]
return x
end

function loop_list_comprehension_dot(x)
# A list comprehension creates a new local scope and is therefore slow.
x .= [ 2 * val for val in x]
return x
end

function for_loop(x)
for i in range(1, length(x))
x[i] = 2 * x[i]
end
return x
end

function for_loop_opt(x)
for i in eachindex(x)
@inbounds x[i] = 2 * x[i]
end
return x
end

@. x = 2 * x
return x
end

function setup_function(n)
return Random.rand(Int64, n)
end

for n in [100, 1000, 10_000, 100_000, 1_000_000]
println("\n\nn = ", n)
BenchmarkTools.@btime loop_with_map(x_vec)                  setup = (x_vec = setup_function(\$n)) # Slow
BenchmarkTools.@btime loop_with_map_dot(x_vec)              setup = (x_vec = setup_function(\$n)) # Very slow
BenchmarkTools.@btime loop_list_comprehension(x_vec)        setup = (x_vec = setup_function(\$n)) # Slow
BenchmarkTools.@btime loop_list_comprehension_dot(x_vec)    setup = (x_vec = setup_function(\$n)) # Very slow
BenchmarkTools.@btime for_loop(x_vec)                       setup = (x_vec = setup_function(\$n)) # Fast
BenchmarkTools.@btime for_loop_opt(x_vec)                   setup = (x_vec = setup_function(\$n)) # Fast
BenchmarkTools.@btime loop_via_broadcast(x_vec)             setup = (x_vec = setup_function(\$n)) # Fast
end
``````

Output:

``````2024_08_16 14:18:30 ~/temp \$ loop_performance.jl

n = 100
89.105 ns (1 allocation: 896 bytes)
163.902 ns (1 allocation: 896 bytes)
138.964 ns (1 allocation: 896 bytes)
163.821 ns (1 allocation: 896 bytes)
12.209 ns (0 allocations: 0 bytes)
11.278 ns (0 allocations: 0 bytes)
12.198 ns (0 allocations: 0 bytes)

n = 1000
1.049 ÎĽs (1 allocation: 7.94 KiB)
1.180 ÎĽs (1 allocation: 7.94 KiB)
1.119 ÎĽs (1 allocation: 7.94 KiB)
1.140 ÎĽs (1 allocation: 7.94 KiB)
80.231 ns (0 allocations: 0 bytes)
80.838 ns (0 allocations: 0 bytes)
81.080 ns (0 allocations: 0 bytes)

n = 10000
7.885 ÎĽs (2 allocations: 78.17 KiB)
11.209 ÎĽs (2 allocations: 78.17 KiB)
8.743 ÎĽs (2 allocations: 78.17 KiB)
11.228 ÎĽs (2 allocations: 78.17 KiB)
1.239 ÎĽs (0 allocations: 0 bytes)
1.215 ÎĽs (0 allocations: 0 bytes)
1.306 ÎĽs (0 allocations: 0 bytes)

n = 100000
116.843 ÎĽs (2 allocations: 781.30 KiB)
171.743 ÎĽs (2 allocations: 781.30 KiB)
119.491 ÎĽs (2 allocations: 781.30 KiB)
174.575 ÎĽs (2 allocations: 781.30 KiB)
15.298 ÎĽs (0 allocations: 0 bytes)
14.419 ÎĽs (0 allocations: 0 bytes)
15.221 ÎĽs (0 allocations: 0 bytes)

n = 1000000
1.428 ms (2 allocations: 7.63 MiB)
2.420 ms (2 allocations: 7.63 MiB)
1.420 ms (2 allocations: 7.63 MiB)
2.573 ms (2 allocations: 7.63 MiB)
696.670 ÎĽs (0 allocations: 0 bytes)
664.293 ÎĽs (0 allocations: 0 bytes)
698.923 ÎĽs (0 allocations: 0 bytes)
2024_08_16 14:21:31 ~/temp \$
``````

Questions:

1. `loop_with_map`: `map` has been reported to be slow before, but a long time ago. I am surprised that it is slow today. Am I doing something wrong?
2. `loop_with_map_dot`: This uses broadcasting to place the result of the calculation back in `x`. The idea is that no new memory for `x` has to be allocated; memory allocation is slow. Obviously this makes `loop_with_map_dot` slower, not faster. I think the reason is that `map` already allocates memory and that the broadcast makes that the old memory of `x` has to be filled instead of just making `x` point to the memory created by `map`. More on broadcasting below. Correct?
3. `loop_list_comprehension`: List comprehensions create their own scope which makes them slow. But I expected this creation of a new scope to take fixed time, i.e. independent of the number of times the loop is executed. What is going on?
4. `loop_list_comprehension_dot`: Just the same as in `loop_with_map_dot`, the broadcast serves to re-use the memory that `x` points to, but as a result all of the values have to be pumped over. Correct?
5. `for_loop_opt`: Skips index checks. Why is this faster than `loop_via_broadcast`? It suggests (falsely?) that `loop_via_broadcast` does a range check while this canâ€™t fail since there is only one container involved. If two with different lengths were involved, I can imagine that a range check makes sense.
6. `loop_via_broadcast`: here the re-use of the memory of `x` actually works. This has been clarified here. Now I wonder if this is the only time this works, in essence.

Yes.

You can use `map!` if you want a map that updates the input, instead of allocating a result vector. Definitely prefer `map!(f, x)` over `x .= map(f, x)`.

2 Likes

A two argument `map!` does not exist in `Base`.

2 Likes

`map!(f, x, x)`, then.
E.g.

``````map!(x -> x + x, x, x)
``````
1 Like
``````â”‚ Warning
â”‚
â”‚  Behavior can be unexpected when any mutated argument
â”‚  shares memory with any other argument.

``````
8 Likes

Link to #50824 that added that aliasing warning and has discussion and links to some related issues. While `map!(f, x, x)` is not currently guaranteed to work, it is relatively uncontroversial that it had ought to work in most circumstances. If youâ€™re feeling slightly brave, youâ€™ll probably never run into issues when the input is identical to the output (but more complicated aliasing could still be problematic). But broadcasting does guarantee that simple aliasing like `x .= 2 .* x` will work, so probably safer to use that (see also `broadcast!`).

Simple counter-example with severe consequences:

``````julia> v = zeros(2); map!(x->x+1, v, v); v
2-element Vector{Float64}:
1.0
1.0

julia> using SparseArrays

julia> v = spzeros(2); map!(x->x+1, v, v); v
2-element SparseVector{Float64, Int64} with 2 stored entries:
[1]  =  2.0
[2]  =  2.0
``````
5 Likes

Professionally, I never feel brave, sorry. My company makes extremely expensive machines and downtime of these machines is also very expensiveâ€¦ and rework of the products of these machines is also costly. So I am looking for the safest, securest ways for loops without loosing performance so I can promote such ways to colleagues.

4 Likes

`SparseArrays` has a specialized method for `map!`, which seems to cause the issue (but could potentially be improved, at least for the trivial case of `===` output with an input). But in any case,

``````julia> v = spzeros(2); broadcast!(x->x+1, v, v); v
2-element SparseVector{Float64, Int64} with 2 stored entries:
[1]  =  1.0
[2]  =  1.0
``````

so `broadcast!` is a suitable replacement for `map!`.

2 Likes

In that case, use one of the two above, with some added remarks:

• Add a trailing `!` to the function names to indicate mutation.
• Remove `@inbounds`, it already happens automatically with `eachindex`.
• you can use the nice update operator `*=`, so that `a = 2 * a` becomes `a *= 2`.
1 Like

1 Like

Yes, broadcasting is much more conservative when it comes to aliasing â€” `broadcast!` doesnâ€™t have that warning, but youâ€™ll also see that it (sometimes) allocates more than youâ€™d expect. It errs on the side of (sometimes) making surprising defensive copies, which you may not want in cases where youâ€™re chasing peak performance â€” which points us back on topic a bit more here.

In this case, with arrays that use the default broadcast implementation, the `loop_via_broadcast` sees that it iterates both the LHS and RHS â€śin stepâ€ť with eachother and thus new outputs wonâ€™t overwrite yet-to-be-read inputsâ€¦ and so the implementation neednâ€™t make that defensive copy.

The most important takeaways when looking at all of these different ways of performing `2x` is that:

• Assignment (the single `=`) is just about names that youâ€™re choosing to use to describe things. Choosing to change what a name means doesnâ€™t have anything to do with â€śmemory re-useâ€ť.
• All arguments are always evaluated â€” separately! â€” before being passed to a function. It can sometimes be tricky to see what makes for a separate function call â€” especially in the face of broadcast dots and other syntax sugars, though.

Finally, and perhaps most mind-bendingly,

• The superpower of Julia is that the code you write can be as fast â€” or faster â€” than its â€śbuiltins.â€ť Broadcasting is complicated, far more complicated than your simple loop. And simple is often fast.
6 Likes

The weak evidence about scope in list comprehensions is here.. Checking this is above my pay grade. Also, I think a simple function has its own scope, so I doubt now my own statement. I got it from Discourse somewhere though, but I canâ€™t find it back.

`loop_with_map` and `loop_list_comprehension` actually do not mutate (to also my surprise), the others do so.

1 Like

That is baffling indeed. Great catch. Although off-topic, if anybody understands this, please share!

`loop_with_map` and `loop_list_comprehension` are both of the form:

``````function f(x)
x = #... something
end
``````

This is just deciding you have a better use for the name `x` within the context of that function body. What `x` happened to be (or not) before the re-assignment has no effect. Itâ€™s just a name for us humans, no different than initially using the name â€śMattâ€ť for me and then deciding that youâ€™d rather use it for yourself. Youâ€™re not changing anything about either me or you by changing our nicknames!

This is just deciding you have a better use for the name `x` within the context of that function body. What `x` happened to be (or not) before the re-assignment has no effect. Itâ€™s just a name for us humans, no different than initially using the name â€śMattâ€ť for me and then deciding that youâ€™d rather use it for yourself. Youâ€™re not changing anything about either me or you by changing our nicknames!
[/quote]

Correct, I spoke in haste. If `x` is assigned a new object in the function, the old object is unchanged and found back after the function is terminated. If the object is essentially re-used, it is mutated and the changed values are found back on the caller side. I understand this.

List comprehensions create a closure, i.e. an anonymous function, so definitely a new scope. However, a new scope does not slow anything down (except possibly the compiler, though I doubt itâ€™s very significant).

To actually answer your question: Just write the loop. It is simple, expressive and fast. Everything else (like broadcasting) is eye candy/convenience in the end. You can still (and actually) should use the appropriate iteration function.

So the best version of your function (which incorporates the tips by @DNF):

``````function for_loop_opt!(x)
for i in eachindex(x)
x[i] *= 2
end
return x
end
``````

This

• is easy to read and understand even when you are not a Julia professional
• does not allocate
• has no possible out-of-bounds memory access (`@inbounds` is dangerous!)
• actually infers the `inbounds` property if the compiler can prove it (e.g. for standard `Array`s)
2 Likes

But I was specifically referring to `for_loop_opt` and `loop_via_broadcast`, which both do.