Julia in these cases is essentially doing the loop
for i in eachindex(m)
m[i] = ...
i.e. it’s an inplace update of the existing array m, instead of binding it to a new array.
That is an inplace update which makes no arrays,
This creates a new array via the comprehension, and rebinds m to it (so you don’t need to pre-allocate here)
this keeps making new arrays in the comprehension, then filling m with it in a loop.
With that understanding it should be pretty obvious why code 1 < code 2 < code 3. Code 2 < code 3 since code 3 has an additional loop, whereas code 2 just changes a pointer. Vectorization in the way that creates temporary arrays is always slower, but it’s okay in most scripting languages because their looping and actual language is slower than it is to vectorize since the vectorized functions are written in C. Since Julia’s standard construct compile to code that is as fast as C (and the standard library is written in Julia, so vectorized functions are just calling other Julia code), warping code to try to be inside of a giant vectorized function call is not better in any a priori sense, so it’s really just the most efficient calculation which matters. Here, that is simply reducing the number of temporary arrays and the number of loops.
The one that you did miss is:
for k in 1:n
m .= j
which is the looping-free version of your first code. Broadcasting can get as fast as the loop in an asymptotic sense, but vectorization like this isn’t strictly better in any sense.
Also, you should look into BenchmarkTools.jl for doing this kind of benchmarking.
“vectorization” does not make code fast and what you are doing is also not vectorization…
In general, .= is slower than =. It can only be faster by removing allocation on the RHS with fusion. In all the code you have above, you are always allocating an array on the RHS so with = you are not doing anything more but with .= you copy the data and discarded the array so the copying makes it slower.
No, because m is an array of size (s,s) and [j for i in 1:s, j in 1:s] is an array of size (s,s). Note that the comprehension is syntax for creating a new array. It will always create an array, so don’t use it if you don’t want the array. Broadcasts will “fuse”, but this will not turn into a loop: it will turn into two separate stages, first building a new array and then looping to save the new array element-wise into m (which is by far the worst way to perform this!).
The best answer is @DNF’s loop. If you want to vectorize it a bit, you’d do something like
s = 500
n = 9000
m = zeros(s, s)
for k in 1:n
for j in 1:s
m[:, j] .= j
but again, don’t think that vectorizing is “good”. It can sometimes make code cleaner, but it has nothing to do with optimizing code (if anything it can add extra operations).
zeros(s,s) by default creates an array of Float64. You’re looking for zeros(Int,s,s). But here’s what happened. The first and third codes write into the array m so it stays a Float64. The second code created a new array (of Int) and replaced m with this new array, so it changed types when you did the =.
eachindex is in that order. Comprehensions are in the order you chose, which was outer then inner.
For some reason you seem really confused still. The issue is that you are mixing two concepts: pre-allocating and vectorization. There are two ways you can build the array of your dreams. One is to create an array m = zeros(s,s) and then fill it with the values. That is like code 1. The other way is to simply create that array using a comprehension, collect on a generator, etc. In this second case, don’t do m = zeros(s,s)! The generator or whatnot is all that is needed since you are created array during those expressions. In case 2, m = [j for i in 1:s, j in 1:s] is perfectly fine if you drop m = zeros(s,s) which is simply not used at all.
No it’s perfectly well defined so it must be allowed. It’s not really the job of the parser to figure out what you want to do and there are also cases where this might not have performance issue at all (and can be faster than a naively fused loop).
.= is NOT an assignment, it’s a mutation of the object, which cannot change it’s type. OTOH, = is assignment, and the type of the LHS after the assignment is completely unrelated to it’s type before this point.
eachindex does. Comprehensions loops in whatever order you write so it’s up to you to write the right order.
We could potentially change this and make m .= comprehension fusing by definition.
On this general subject as a whole, Julia really does have all the tools in the area that one could want but I think it’s still a bit confusing to the newcomer – a clear and definitive guide to vectorization and looping in Julia remains to be written. We’ve got some great blog posts on the subject, but one article that lays it all out doesn’t seem to exist.
Although this is not directly related to the OP’s question: in case of performance I would add that @inbounds (e.g. in front of m[i,j] ) should increase the performance further (about 7% on my computer)