Reason for the time difference in assigning columns with dot syntax

Shuhua · October 9, 2020, 12:08pm

julia> const N = 10000;

julia> const x = rand(N);

julia> const m = zeros(N, 2);

julia> using BenchmarkTools

julia> @btime $m[:, 1] = $x;
  12.799 μs (0 allocations: 0 bytes)

julia> @btime $m[:, 1] .= $x;
  2.633 μs (0 allocations: 0 bytes)

Since neither method involves temporary memory allocation, why is there a significant time difference? Does it mean that we should always prefer the dot syntax when assigning a vector (or a column/row) in place? Thank you.

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_NUM_THREADS = 4

CameronBieganek · October 9, 2020, 2:10pm

That’s a good question. I’ve also wondered about the difference between m[:, 1] = x and m[:, 1] .= x. It doesn’t seem to be well documented, at least not that I’ve found.

Observe the printed output in the following example:

julia> x = [1.0, 2.0];

julia> m = zeros(2, 2);

julia> m[:, 1] = x
2-element Array{Float64,1}:
 1.0
 2.0

julia> m[:, 2] .= x
2-element view(::Array{Float64,2}, :, 2) with eltype Float64:
 1.0
 2.0

Note that when you do m[:, 2] .= x the printed output is some kind of view. My hypothesis is that when you do m[:, 1] = x, the elements of x get copied into the first column of m. But when you do m[:, 1] .= x, the first column of m becomes a reference (a view) to the elements of x, so no copying occurs. But I could be wrong. Hopefully somebody more knowledgeable will chime in.

kristoffer.carlsson · October 9, 2020, 3:00pm

That is not true.

CameronBieganek · October 9, 2020, 3:01pm

Ok, then it would be great if somebody could explain the difference.

kristoffer.carlsson · October 9, 2020, 3:23pm

The broadcast one ends up here:

https://github.com/JuliaLang/julia/blob/02032c4af67e4257d1e5b7f26875d00330d481b3/base/abstractarray.jl#L919-L921

The setindex ends up in:

https://github.com/JuliaLang/julia/blob/02032c4af67e4257d1e5b7f26875d00330d481b3/base/multidimensional.jl#L805-L822

It seems the compiler likes one of them more than the other.

CameronBieganek · October 9, 2020, 3:36pm

Interesting. So it seems the performance difference is mostly due to implementation details.

So, my current understanding of the semantic difference between m[:, 1] = x and m[:, 1] .= x is that semantically they do the same thing—the only difference is the value returned from the assignment expression. The regular assignment expression returns x, whereas the broadcasting assignment expression returns @view m[:, 1].

Here’s an example that does not disprove my hypothesis:

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> out = (a[2:3] = b);

julia> typeof(out)
Array{Int64,1}

julia> out === b
true

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> out = (a[2:3] .= b);

julia> typeof(out)
SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true}

julia> parent(out) === a
true

CameronBieganek · October 9, 2020, 3:47pm

Another example:

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> x = a[2:3] .= b
2-element view(::Array{Int64,1}, 2:3) with eltype Int64:
 10
 11

julia> x[2] = 100
100

julia> a
4-element Array{Int64,1}:
   1
  10
 100
   4

julia> b
2-element Array{Int64,1}:
 10
 11

I’m not sure how useful this behavior is in practice. It depends on how often you perform multiple assignments in one statement like x = a[2:3] .= b.

Shuhua · October 10, 2020, 3:23am

@CameronBieganek @kristoffer.carlsson Thank you for your help, but I am afraid we still do not get the true reason. Let’s play it together. I tried to inspect the code by “lower” it. Something interesting happened.

@code_lowered m[:, 1] = x

CodeInfo(
1 ─      nothing
│   %2 = Base.IndexStyle(A)
│   %3 = Core.tuple(%2, A)
│        Core._apply_iterate(Base.iterate, Base.error_if_canonical_setindex, %3, I)
│   %5 = Base.IndexStyle(A)
│   %6 = Core.tuple(%5, A, v)
│   %7 = Base.to_indices(A, I)
│   %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
└──      return %8
)

However, the “dot” one cannot be inspected as above.

@code_lowered m[:, 1] .= x

Error: expression is not a function call, or is too complex for @code_lowered to analyze; break it down to simpler parts if possible. In some cases, you may want to use Meta.@lower.

As suggested by the error message, I then used the Meta.@lower.

Meta.@lower m[:, 1] .= x

:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.dotview(m, :, 1)
│   %2 = Base.broadcasted(Base.identity, x)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

Now it becomes clear that the dot syntax triggered broadcasting, which was then translated to totally different operations internally. Nevertheless, I thought the compiler should be smart enough to treat the two syntaxes identically.

In summary, the non-dot version resorts to Core._apply_iterate, while the dot one turns to Base.broadcasted. The time difference implies some magic behind broadcasting that leads to more optimized machine code. However, I do not know any further details in the deeper level behind the two. Hope this finding can provide some clues.

kristoffer.carlsson · October 10, 2020, 6:47am

I showed the place where the actual copying ends up happening. That’s different in the broadcast case and the setindex case. So there is no magic, just that two different pieces of generic code (that in this case happens to do the same) have different performance.

Shuhua · October 10, 2020, 8:35am

Thank you, got it. However, why does not the compiler generate the same efficient code for the two styles? I mean, the compiler is expected to reduce the non-dot version to the dot version automatically, since there seems to be no side effect.

Currently, we have to remember to use the dot broadcasting syntax even in this naive scenario. I often overlook it because it appears obvious to me that no memory allocation is involved and the operation is expected to be efficient enough.

kristoffer.carlsson · October 10, 2020, 8:49am

There could be many reasons, a compiler is after all just a program written by people.

I agree it would be good for these to perform the same. You could open an issue with your example.

Topic		Replies	Views
.== performance regression New to Julia	10	1052	September 4, 2017
@btime's wrong output New to Julia	5	450	August 24, 2019
Performance difference between two code New to Julia	3	782	January 18, 2017
Dot(x, inv.(y)) vs non-allocating one, the former is faster? Performance	5	641	March 2, 2019
Comprehension vs map and filter unexpected speeds General Usage question	22	1706	November 20, 2019

Reason for the time difference in assigning columns with dot syntax

Related topics