Reason for the time difference in assigning columns with dot syntax

julia> const N = 10000;

julia> const x = rand(N);

julia> const m = zeros(N, 2);

julia> using BenchmarkTools

julia> @btime $m[:, 1] = $x;
  12.799 μs (0 allocations: 0 bytes)

julia> @btime $m[:, 1] .= $x;
  2.633 μs (0 allocations: 0 bytes)

Since neither method involves temporary memory allocation, why is there a significant time difference? Does it mean that we should always prefer the dot syntax when assigning a vector (or a column/row) in place? Thank you.

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA_NUM_THREADS = 4
3 Likes

That’s a good question. I’ve also wondered about the difference between m[:, 1] = x and m[:, 1] .= x. It doesn’t seem to be well documented, at least not that I’ve found.

Observe the printed output in the following example:

julia> x = [1.0, 2.0];

julia> m = zeros(2, 2);

julia> m[:, 1] = x
2-element Array{Float64,1}:
 1.0
 2.0

julia> m[:, 2] .= x
2-element view(::Array{Float64,2}, :, 2) with eltype Float64:
 1.0
 2.0

Note that when you do m[:, 2] .= x the printed output is some kind of view. My hypothesis is that when you do m[:, 1] = x, the elements of x get copied into the first column of m. But when you do m[:, 1] .= x, the first column of m becomes a reference (a view) to the elements of x, so no copying occurs. But I could be wrong. Hopefully somebody more knowledgeable will chime in.

That is not true.

Ok, then it would be great if somebody could explain the difference. :slightly_smiling_face:

1 Like

The broadcast one ends up here:

The setindex ends up in:

It seems the compiler likes one of them more than the other.

2 Likes

Interesting. So it seems the performance difference is mostly due to implementation details.

So, my current understanding of the semantic difference between m[:, 1] = x and m[:, 1] .= x is that semantically they do the same thing—the only difference is the value returned from the assignment expression. The regular assignment expression returns x, whereas the broadcasting assignment expression returns @view m[:, 1].

Here’s an example that does not disprove my hypothesis:

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> out = (a[2:3] = b);

julia> typeof(out)
Array{Int64,1}

julia> out === b
true

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> out = (a[2:3] .= b);

julia> typeof(out)
SubArray{Int64,1,Array{Int64,1},Tuple{UnitRange{Int64}},true}

julia> parent(out) === a
true

Another example:

julia> a = [1, 2, 3, 4]; b = [10, 11];

julia> x = a[2:3] .= b
2-element view(::Array{Int64,1}, 2:3) with eltype Int64:
 10
 11

julia> x[2] = 100
100

julia> a
4-element Array{Int64,1}:
   1
  10
 100
   4

julia> b
2-element Array{Int64,1}:
 10
 11

I’m not sure how useful this behavior is in practice. It depends on how often you perform multiple assignments in one statement like x = a[2:3] .= b.

@CameronBieganek @kristoffer.carlsson Thank you for your help, but I am afraid we still do not get the true reason. Let’s play it together. I tried to inspect the code by “lower” it. Something interesting happened.

@code_lowered m[:, 1] = x

CodeInfo(
1 ─      nothing
│   %2 = Base.IndexStyle(A)
│   %3 = Core.tuple(%2, A)
│        Core._apply_iterate(Base.iterate, Base.error_if_canonical_setindex, %3, I)
│   %5 = Base.IndexStyle(A)
│   %6 = Core.tuple(%5, A, v)
│   %7 = Base.to_indices(A, I)
│   %8 = Core._apply_iterate(Base.iterate, Base._setindex!, %6, %7)
└──      return %8
)

However, the “dot” one cannot be inspected as above.

@code_lowered m[:, 1] .= x

Error: expression is not a function call, or is too complex for @code_lowered to analyze; break it down to simpler parts if possible. In some cases, you may want to use Meta.@lower.

As suggested by the error message, I then used the Meta.@lower.

Meta.@lower m[:, 1] .= x

:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = Base.dotview(m, :, 1)
│   %2 = Base.broadcasted(Base.identity, x)
│   %3 = Base.materialize!(%1, %2)
└──      return %3
))))

Now it becomes clear that the dot syntax triggered broadcasting, which was then translated to totally different operations internally. Nevertheless, I thought the compiler should be smart enough to treat the two syntaxes identically.

In summary, the non-dot version resorts to Core._apply_iterate, while the dot one turns to Base.broadcasted. The time difference implies some magic behind broadcasting that leads to more optimized machine code. However, I do not know any further details in the deeper level behind the two. Hope this finding can provide some clues.

2 Likes

I showed the place where the actual copying ends up happening. That’s different in the broadcast case and the setindex case. So there is no magic, just that two different pieces of generic code (that in this case happens to do the same) have different performance.

1 Like

Thank you, got it. However, why does not the compiler generate the same efficient code for the two styles? I mean, the compiler is expected to reduce the non-dot version to the dot version automatically, since there seems to be no side effect.

Currently, we have to remember to use the dot broadcasting syntax even in this naive scenario. I often overlook it because it appears obvious to me that no memory allocation is involved and the operation is expected to be efficient enough.

There could be many reasons, a compiler is after all just a program written by people.

I agree it would be good for these to perform the same. You could open an issue with your example.

3 Likes