Performance of array broadcasting

xqrongm · July 7, 2017, 7:05pm

Julia v0.6.0. I have the following snippets. I wonder why “snippet_1” is so much more efficient than the others. Is it possible to make the others as efficient as “snippet_1”? Thanks!

i_ = 40
j_ = 16
a1 = rand(i_,j_,2);


# snippet_1: 0.0005 seconds (5.20 k allocations: 302.531 KiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = rand(2) .* [1,1]
    end
end

# snippet_2: 0.032269 seconds (25.43 k allocations: 1.150 MiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = rand(2) .* 1
    end
end

# snippet_3: 0.030712 seconds (44.24 k allocations: 1.946 MiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = a1ij .* [1,1]
    end
end

# snippet_4: 0.032377 seconds (24.79 k allocations: 1.091 MiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = a1ij .* 1
    end
end

ChrisRackauckas · July 7, 2017, 7:11pm

You’re creating temporaries by slicing. Use @view A[i:j] etc.

xqrongm · July 7, 2017, 7:18pm

Thanks for the suggesion about @view, but I’m not sure if I’m using it the right way? Using the code below, @view seems to be less efficient.


julia> @time x = a1[1,1,:]
  0.000005 seconds (6 allocations: 272 bytes)
2-element Array{Float64,1}:
 0.228546
 0.995806

julia> @time x = @view a1[1,1,:]
  0.000074 seconds (37 allocations: 896 bytes)
2-element SubArray{Float64,1,Array{Float64,3},Tuple{Int64,Int64,Base.Slice{Base.OneTo{Int64}}},true}:
 0.228546
 0.995806

ChrisRackauckas · July 7, 2017, 7:21pm

Well the other thing is that your indexing is backwards. In a column-major language (Julia), you want to be slicing by columns, i.e. the first index. You want @view a1[:,1,1]. All of your loops are backwards too.

stevengj · July 7, 2017, 7:24pm

(Don’t benchmark in global scope. Use .= for in-place assignment that fuses with operations like .*.)

Note that for initializing only two components, the overhead even of creating a view, or calling broadcast, or allocating a temporary array like [1,1] will be very significant. For such a smll array in an inner loop, the fastest things are to write out your own loop or to use something like the StaticArrays package.

xqrongm · July 7, 2017, 7:28pm

That is a good point. But my code has to slice it in all dimensions. Am I supposed to use permutedims to set the dimension I want to slice the 1st?

xqrongm · July 7, 2017, 7:37pm

Thanks for the suggestion, but in this case below .= seems less efficient than =

# snippet_1: 0.0005 seconds (5.20 k allocations: 302.531 KiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = rand(2) .* [1,1]
    end
end

# snippet_5: 0.006824 seconds (28.24 k allocations: 762.531 KiB)
@time begin
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] .= rand(2) .* [1,1]
    end
end

ChrisRackauckas · July 7, 2017, 7:41pm

Again, you’re timing globals and compilation times. Put it in a function. And “vectorization” only makes sense when the vectors are sufficiently large (even then, it won’t be as efficient as a loop, just close).

xqrongm · July 7, 2017, 7:50pm

I’m not sure what you mean by put it in a function. This is what I did, below = still seems more efficient than .=. I ran the last two lines a couple times to let it compile before write down the time elapsed.

function f1()
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] = rand(2) .* [1,1]
    end
end

function f5()
    for i in 1:i_, j in 1:j_
        a1ij = a1[i,j,:]
        a1[i,j,:] .= rand(2) .* [1,1]
    end
end

@time f1() # 0.001070 seconds (5.21 k allocations: 302.688 KiB)
@time f5() # 0.008428 seconds (28.25 k allocations: 762.688 KiB)

xqrongm · July 7, 2017, 7:53pm

I guess you mean creating a vector by slicing an array isn’t as efficient as a loop? That’s a good point and I’ll remember it for future use. But for my code, I slice it because I need to use the vector for matrix multiplication later, I haven’t thought of a way to avoid slicing it into a vector.

ggggggggg · July 7, 2017, 8:30pm

Read the performance tips. In particular you are having problems with global variables, which is what the “put it in a function comment” was meant to address. Your attempt to put it in a function didn’t address the problem, because you were still referring to global variables, rather than passing them as arguments.

Consider the following, the only difference between the two function is that f1 takes an argument, and thus can compile to specialized code based on the type of a. Whereas f2 refers to a global variable, and since that variable can be changed at will, julia won’t specialize the code of f2 based on the type of a. Thus f2 is slow.

a=rand(40,16,2)
function f1(a)
    for i = 1:size(a,1), j=1:size(a,2)
        a[i,j,:] = rand(2)
    end
end
function f2()
    for i = 1:size(a,1), j=1:size(a,2)
        a[i,j,:] = rand(2)
    end
end
@time f1(a) #0.000263 seconds
@time f2()  #0.001927 seconds

xqrongm · July 8, 2017, 1:59pm

ook. Now I get it. Thank everyone for the great advice!

Topic		Replies	Views
Array broadcasting slower than numpy? Performance	20	721	June 4, 2022
Understanding major order performance when broadcasting in column vs row operations Performance question , array , benchmark	9	1003	June 21, 2021
Migrating code from Matlab: Accessing arrays without allocating Performance	20	1946	November 19, 2018
Strange dot-syntax result New to Julia	17	461	October 6, 2020
Surprised that broadcast does not use views? Performance broadcast	8	1210	June 26, 2019

Performance of array broadcasting

Related topics