Hi,

I was rewriting some Matlab code into Julia, and I noticed that one function which used array broadcasting operations ran several times slower on Julia. I though this was really strange and I made a simple test function to test broadcasting operations in a few different ways and print the elapsed times. So, the test function is the following:

``````function test_broadcast(M::Int, N::Int, niter::Int=1000)
X = randn(M);

# Case A - simple broadcast
A = zeros(ComplexF64, M, N)
@time begin
for ii = 1:niter
A .+= exp.(im*X)
end
end

# Case B - broadcast with map
B = zeros(ComplexF64, M, N)
@time begin
for ii = 1:niter
B .+= map(x -> exp(x), im*X)
end
end

C = zeros(ComplexF64, M, N)
@time begin
for ii = 1:niter
end
end

# Case D - broadcast using auxiliary variable
D = zeros(ComplexF64, M, N)
@time begin
for ii = 1:niter
Daux = exp.(im*X)
D .+= Daux
end
end

if !all((A .== B) .& (A .== C) .& (A .== D))
throw(ErrorException("Results mismatch"))
end
end
``````

when executing this code using for `M = 10000` and `N = 10`, I get the following results

``````julia> test_broadcast(10000, 10)
1.826595 seconds (2.00 k allocations: 152.664 MiB, 0.13% gc time)
0.275579 seconds (5.00 k allocations: 305.344 MiB, 1.73% gc time)
0.268911 seconds (4.00 k allocations: 305.328 MiB, 1.74% gc time)
0.269613 seconds (4.00 k allocations: 305.328 MiB, 1.77% gc time)
``````

So in this example the case A is roughly 7 times slower than all the other cases.

This code is really simple, I donâ€™t know if I am missing anything. According to the documentation the case A should be equivalent to using the broadcast function. I tried this with Julia versions 1.1, 1.2, and 1.3-rc3, and the results are similar. To me this seems like a bug. I have searched open issues in github and there is this bug https://github.com/JuliaLang/julia/issues/28126 where it seems that there is a problem with broadcasting when using many arrays of the same size, but in my example I only have one array, so it seems that this is a different problem.

This is my output of versioninfo:

``````julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = "/usr/share/code/code"
``````

Any idea of why the case A is much worse?

You are breaking the broadcast fusion by not proadcasting the multiplication by `im`. Try to eother broadcast it or put the multiplication by im into the function youâ€™re mapping with.

While true, the problem here is different. With the broadcasting, the `exp` is computed once for every element in `A`, that is, (M * N) times (it has to be, the `+=` is fused with the call). In the other cases, `exp` is computed only `M` times and the result is then added to all the columns.

Just adding a print statement to `exp`, making the sizes a bit smaller gives:

``````julia> test_broadcast(3, 2, 1)
A
Calling exp
Calling exp
Calling exp
Calling exp
Calling exp
Calling exp

B
Calling exp
Calling exp
Calling exp

C
Calling exp
Calling exp
Calling exp

D
Calling exp
Calling exp
Calling exp
``````
6 Likes

Ah, that was a really sneaky one! I could have stared at that for a long while before figuring that out

On an unrelated note, `cis(x)` should be a faster equivalent of `exp(im * x)` for real `x`

3 Likes

Ah, thank you very much!, I was really confused by this, now I understand how the dot operator and the fusing of operations works.

Also thanks to stevengj for suggesting the `cis` function, I didnâ€™t know about it. I have tested it and it indeed runs a bit faster.