julia> x = Random.rand(Int64, 2)
2-element Vector{Int64}:
-260682396598444552
-7056639715971435629
julia> loop_with_map(x);
julia> x
2-element Vector{Int64}:
-260682396598444552
-7056639715971435629
# x still the same
but
julia> loop_with_map_dot(x);
julia> x
2-element Vector{Int64}:
-521364793196889104
4333464641766680358
# x has mutated
A meta comment is that you almost never want a loop to perform an operation as simple as multiplying elements by 2 if performance matters. Your granularity is too small. You want to combine loops so that you make as few passes over your arrays as possible and do lots of work per iteration.
Coarse-graining your loops also helps you to eliminate temporary arrays.
(Languages like Matlab and Python teach people the bad habit of decomposing every algorithm into sequences of simple βvectorizedβ steps, to work around the performance limitations of their loops.)
It doesnβt sound to me like GPU computing matters and then broadcasting bears the risk of being slower, is generally less expressive than loops and possible harder to read.
If this were really true, then broadcasting would would have no point at all. In many cases broadcasting is both as fast, as well as safer, and definitely easier to read. Not always, but often.
I see that my short statement might convey the wrong message.
I assumed that OP likely has generally more complicated operations than multiplying all elements of an array by 2. Broadcasting is great for smallish and rather simple operations where it is also very readable. However in general explicit loops offer the most expressive power (i.e. there is loop code that you cannot reasonably express as broadcast). So in general a for loop is the most basic construct for efficient looping while broadcasting can make things nicer to read and perhaps a bit easier to write iff the problem is suited for broadcasting.
And for performance I was referring to:
So especially for beginners, I would recommend writing correct loop-based code first and then learning to see opportunities where broadcasting can simplify things. However, GPU computing might be an exception.
Writing loop-based code can in some cases be significantly more challenging, in particular, you need to pre-allocate and figure out correct element-types, which can be highly non-trivial. Broadcasting is also likely to be very familiar to beginners coming from the python/matlab/r world.
I would definitely recommend both styles from the very beginning.
Thanks for your comments and advice. The original code misses exclamation marks in function signatures and the use of map! opens possibilities for aliasing issues if the mutated argument shares memory with any other argument. So I conclude that map and map! are indeed slow and not recommended right now. Promoting map! on my end will sooner or later lead to an aliasing issue when the subtleties of the documentation are forgotten. It is better to use broadcast or broadcast! as replacement for map and map!. An allocation for the results of the calculation can be avoided if a pre-existing collection can be overwritten using .=. From the documentation: βA special syntax exists for broadcasting: f.(argsβ¦) is equivalent to broadcast(f, argsβ¦)β so the dot notation may as well be used for the right hand side instead of typing broadcast in full. Combining these two gives several dots and the .@ macro can be used to keep track of the dots automatically. Also, broadcasting may allocate more memory than expected. With a for-loop it is unlikely that performance or clarity is lost.
One thing I learned while thinking about this is that you really do have to explicitly broadcast the multiplication, otherwise the following forms are slow.
julia> function f(x)
x .= 2x
end
f (generic function with 1 method)
julia> @benchmark f($x)
BenchmarkTools.Trial: 10000 samples with 194 evaluations.
Range (min β¦ max): 507.515 ns β¦ 61.053 ΞΌs β GC (min β¦ max): 0.00% β¦ 98.65%
Time (median): 817.332 ns β GC (median): 0.00%
Time (mean Β± Ο): 1.165 ΞΌs Β± 1.813 ΞΌs β GC (mean Β± Ο): 27.94% Β± 16.65%
ββββ β β
βββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
508 ns Histogram: log(frequency) by time 10.5 ΞΌs <
Memory estimate: 8.00 KiB, allocs estimate: 1.
julia> function f(x)
x .= 2*x
end
f (generic function with 1 method)
julia> @benchmark f($x)
BenchmarkTools.Trial: 10000 samples with 193 evaluations.
Range (min β¦ max): 506.047 ns β¦ 61.707 ΞΌs β GC (min β¦ max): 0.00% β¦ 98.48%
Time (median): 823.187 ns β GC (median): 0.00%
Time (mean Β± Ο): 1.195 ΞΌs Β± 1.886 ΞΌs β GC (mean Β± Ο): 28.36% Β± 16.65%
ββββ β
ββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
506 ns Histogram: log(frequency) by time 11 ΞΌs <
Memory estimate: 8.00 KiB, allocs estimate: 1.
julia> function f(x)
x .= 2 .* x
end
f (generic function with 1 method)