Julia has several ways to define loops or repeat code, each with advantages and disadvantages. I am looking at the performance of some also inspired by previous posts on this forum.
#!/usr/bin/env julia
# https://discourse.julialang.org/t/map-vs-loops-array-comprehensions-in-julia-1-0/16137
# Note, this thread is from 2018.
import BenchmarkTools
import Random
function loop_with_map(x)
x = map(a -> 2 * a, x)
return x
end
function loop_with_map_dot(x)
x .= map(a -> 2 * a, x) # Slowest
return x
end
function loop_list_comprehension(x)
# A list comprehension creates a new local scope and is therefore slow.
x = [ 2 * val for val in x]
return x
end
function loop_list_comprehension_dot(x)
# A list comprehension creates a new local scope and is therefore slow.
x .= [ 2 * val for val in x]
return x
end
function for_loop(x)
for i in range(1, length(x))
x[i] = 2 * x[i]
end
return x
end
function for_loop_opt(x)
for i in eachindex(x)
@inbounds x[i] = 2 * x[i]
end
return x
end
function loop_via_broadcast(x)
@. x = 2 * x
return x
end
function setup_function(n)
return Random.rand(Int64, n)
end
for n in [100, 1000, 10_000, 100_000, 1_000_000]
println("\n\nn = ", n)
BenchmarkTools.@btime loop_with_map(x_vec) setup = (x_vec = setup_function($n)) # Slow
BenchmarkTools.@btime loop_with_map_dot(x_vec) setup = (x_vec = setup_function($n)) # Very slow
BenchmarkTools.@btime loop_list_comprehension(x_vec) setup = (x_vec = setup_function($n)) # Slow
BenchmarkTools.@btime loop_list_comprehension_dot(x_vec) setup = (x_vec = setup_function($n)) # Very slow
BenchmarkTools.@btime for_loop(x_vec) setup = (x_vec = setup_function($n)) # Fast
BenchmarkTools.@btime for_loop_opt(x_vec) setup = (x_vec = setup_function($n)) # Fast
BenchmarkTools.@btime loop_via_broadcast(x_vec) setup = (x_vec = setup_function($n)) # Fast
end
loop_with_map: map has been reported to be slow before, but a long time ago. I am surprised that it is slow today. Am I doing something wrong?
loop_with_map_dot: This uses broadcasting to place the result of the calculation back in x. The idea is that no new memory for x has to be allocated; memory allocation is slow. Obviously this makes loop_with_map_dot slower, not faster. I think the reason is that map already allocates memory and that the broadcast makes that the old memory of x has to be filled instead of just making x point to the memory created by map. More on broadcasting below. Correct?
loop_list_comprehension: List comprehensions create their own scope which makes them slow. But I expected this creation of a new scope to take fixed time, i.e. independent of the number of times the loop is executed. What is going on?
loop_list_comprehension_dot: Just the same as in loop_with_map_dot, the broadcast serves to re-use the memory that x points to, but as a result all of the values have to be pumped over. Correct?
for_loop_opt: Skips index checks. Why is this faster than loop_via_broadcast? It suggests (falsely?) that loop_via_broadcastdoes a range check while this can’t fail since there is only one container involved. If two with different lengths were involved, I can imagine that a range check makes sense.
loop_via_broadcast: here the re-use of the memory of x actually works. This has been clarified here. Now I wonder if this is the only time this works, in essence.
Link to #50824 that added that aliasing warning and has discussion and links to some related issues. While map!(f, x, x) is not currently guaranteed to work, it is relatively uncontroversial that it had ought to work in most circumstances. If you’re feeling slightly brave, you’ll probably never run into issues when the input is identical to the output (but more complicated aliasing could still be problematic). But broadcasting does guarantee that simple aliasing like x .= 2 .* x will work, so probably safer to use that (see also broadcast!).
Professionally, I never feel brave, sorry. My company makes extremely expensive machines and downtime of these machines is also very expensive… and rework of the products of these machines is also costly. So I am looking for the safest, securest ways for loops without loosing performance so I can promote such ways to colleagues.
SparseArrays has a specialized method for map!, which seems to cause the issue (but could potentially be improved, at least for the trivial case of === output with an input). But in any case,
julia> v = spzeros(2); broadcast!(x->x+1, v, v); v
2-element SparseVector{Float64, Int64} with 2 stored entries:
[1] = 1.0
[2] = 1.0
Yes, broadcasting is much more conservative when it comes to aliasing — broadcast!doesn’t have that warning, but you’ll also see that it (sometimes) allocates more than you’d expect. It errs on the side of (sometimes) making surprising defensive copies, which you may not want in cases where you’re chasing peak performance — which points us back on topic a bit more here.
In this case, with arrays that use the default broadcast implementation, the loop_via_broadcast sees that it iterates both the LHS and RHS “in step” with eachother and thus new outputs won’t overwrite yet-to-be-read inputs… and so the implementation needn’t make that defensive copy.
The most important takeaways when looking at all of these different ways of performing 2x is that:
Assignment (the single =) is just about names that you’re choosing to use to describe things. Choosing to change what a name means doesn’t have anything to do with “memory re-use”.
All arguments are always evaluated — separately! — before being passed to a function. It can sometimes be tricky to see what makes for a separate function call — especially in the face of broadcast dots and other syntax sugars, though.
Finally, and perhaps most mind-bendingly,
The superpower of Julia is that the code you write can be as fast — or faster — than its “builtins.” Broadcasting is complicated, far more complicated than your simple loop. And simple is often fast.
The weak evidence about scope in list comprehensions is here.. Checking this is above my pay grade. Also, I think a simple function has its own scope, so I doubt now my own statement. I got it from Discourse somewhere though, but I can’t find it back.
loop_with_map and loop_list_comprehension are both of the form:
function f(x)
x = #... something
end
This is just deciding you have a better use for the name x within the context of that function body. What x happened to be (or not) before the re-assignment has no effect. It’s just a name for us humans, no different than initially using the name “Matt” for me and then deciding that you’d rather use it for yourself. You’re not changing anything about either me or you by changing our nicknames!
This is just deciding you have a better use for the name x within the context of that function body. What x happened to be (or not) before the re-assignment has no effect. It’s just a name for us humans, no different than initially using the name “Matt” for me and then deciding that you’d rather use it for yourself. You’re not changing anything about either me or you by changing our nicknames!
[/quote]
Correct, I spoke in haste. If x is assigned a new object in the function, the old object is unchanged and found back after the function is terminated. If the object is essentially re-used, it is mutated and the changed values are found back on the caller side. I understand this.
List comprehensions create a closure, i.e. an anonymous function, so definitely a new scope. However, a new scope does not slow anything down (except possibly the compiler, though I doubt it’s very significant).
To actually answer your question: Just write the loop. It is simple, expressive and fast. Everything else (like broadcasting) is eye candy/convenience in the end. You can still (and actually) should use the appropriate iteration function.
So the best version of your function (which incorporates the tips by @DNF):
function for_loop_opt!(x)
for i in eachindex(x)
x[i] *= 2
end
return x
end
This
is easy to read and understand even when you are not a Julia professional
does not allocate
has no possible out-of-bounds memory access (@inbounds is dangerous!)
actually infers the inbounds property if the compiler can prove it (e.g. for standard Arrays)