When can one expect assignment to an array to run in parallel?

Consider

N = 8_000_000_000
a = zeros(Int64, N)
a .= 1

For some reason I expected the broadcast assignment to run in parallel on multiple threads. According to my experiments, that is not happening.

Does anyone have an insight?

Afaik, there is no implicit parallelism in Julia (except in blas). I think it’s a good thing because your assignment could be called inside a multi threaded program. Nested MT is a difficult task to setup in the general case.

In addition, you can only expect a moderate speed up on such a massively memory bound task.

That is true. But that is precisely what is happening with BLAS: it runs multithreaded, no matter what.
And I thought this operation would end up in the BLAS…

True, but I did get a speed up over 2.3 on four threads. Not bad…

1 Like

It make sense. I do not think that this assignment is turned into a blas1 copy (no perf improvement is expected for such trivial task). I guess that recent version of loopvectorization.jl can parallelize broadcasted expressions now.

Yes it should exactly match you max hardware bandwidth.

1 Like