I’m not sure if I can equivalently rewrite a for-loop, with the help of the dot syntax. Here “equivalently” is in terms of both the logical executing sequence, and performance.
When looking at the code, the question is: if I can equivalently rewrite main1 into main2. (or, does main2 resolves into main1). Thanks a lot.
import Random
function main1()
function subprocedure(c)
a += sum(c)
end
a = 0
for _ in 1:5
subprocedure(rand(1:9, 4))
end
return a
end
function main2()
function subprocedure(c)
a += sum(c)
end
a = 0
subprocedure.([rand(1:9, 4) for _ in 1:5])
return a
end
common_seed = 678;
Random.seed!(common_seed);
@time main1()
Random.seed!(common_seed);
@time main2()
I mean, the answer depends upon your threshold for what equivalent means. Broadcasting itself is built with for loops that you yourself could have written… but with a lot of extra machinery.
But the bigger question is why? Broadcasting is built to apply a function over all elements of some array(s) and return the results for all those elements. In main2(), you’re artificially creating an array — that comprehension — just so you can broadcast over it. And broadcasting also creates an array of results that you don’t even look at. Those are both needless steps, especially when a for loop is the obvious way to write such a thing.
What differences are you noting? I think the biggest one is probably in their compile time — broadcasting is more complicated and so it’s likely main2 will take longer to compile on the first run, but the second calls will perform more similarly. The way your timing is setup, you’re measuring nearly all compile time.
It’s probably also worth noting that closures like subprocedure that update an assignment from an outer scope like that are not going to give you optimal performance.
The for loop is executed in series, like for i in 1:5, the iteration i = 2 should start only after i = 1 is finished.
If we don’t care the logical executing sequence above, e.g. ys = sin.([1, 2, 3.]), then it is clear that broadcasting is way more convenient. Here is a question: will broadcasting be faster than its naive for-loop counterpart? According to your answer, I guess the answer is No. (am I correct?)
According to your answer, it makes more sense to use for directly in the context of my example. But sometimes using dot syntax could make the code looks more compact. But then the question is: will the logical executing sequence get correct? e.g. the computation of sin(2.) should happen only after the execution of sin(1.) (suppose the case that some status will be changed after executing sin(1.), just as the a in my example). If the logical executing sequence is correct, will dot syntax bring us benefits in terms of computational performance? (According to your answer, the answer is No. (am I correct?))
But this style is quite handy which facilitates debugging. In practice I always go with this style. Otherwise my code would appears to be way too cumbersome, which makes me distracted.
Correct, but you can also quite easily enable parallelism on a regular for loop yourself with a @threads for or the like.
and 3. Depending upon the input types, broadcast might do something fancy and (for example) avoid repeated evaluations of a structural zero or the like. But it’s all just implemented in Julia itself and something you could write yourself. Julia’s for loops are fast! The one exception is with GPU arrays; there broadcasting may exploit parallelism in ways that are harder to express with a for loop.
Most functions for accumulating or reducing arrays/vectors (like sum, minimum , broadcast etc.) are implemented in julia with for loops. Writing the for loop yourself will often run equally fast. Sometimes it may run faster, because the built in functions are general whereas if you write it yourself it is specific to your data. Sometimes the built in function is faster because it’s written in a way which exploits low level parallelism (simd etc.), memory locality, cache optimization etc.