Contiguous setindex involving an adjoint is slower than non-contiguous for a Matrix

julia> a = zeros(500,500);

julia> b = zeros(500,500)';

julia> v = 1:500;

julia> @btime $a[1,:] = $v;
  711.515 ns (0 allocations: 0 bytes)

julia> @btime $b[1,:] = $v;
  808.444 ns (0 allocations: 0 bytes)

julia> @btime $a[:,1] = $v;
  695.041 ns (0 allocations: 0 bytes)

I don’t understand why the setindex! involving b is slower than the non-contiguous setindex! involving a. If anything, I would have expected indexing b along columns to be as performant as indexing a along rows.

Interestingly the expected behavior is seen if the assignment is done through broadcasting:

julia> @btime $a[1,:] .= $v;
  744.464 ns (0 allocations: 0 bytes)

julia> @btime $b[1,:] .= $v;
  649.030 ns (0 allocations: 0 bytes)

julia> @btime $a[:,1] .= $v;
  651.577 ns (0 allocations: 0 bytes)

Why is there a difference if the broadcasting is left implicit?


Yeah. This is really weird. It’s probably an unfortunate generic fallback

You have it backwards: b[1,:] is non-contiguous, and a[:,1] is contiguous. Remember that Julia arrays are column-major. b[1,:] is indeed contiguous because b is an adjoint/transpose of a Matrix.

b is the adjoint of a matrix, so the column indices of b should be contiguous?

Oh, right, I didn’t notice this, sorry.

Might not be crazy to overload something to make this fast, maybe a method of _setindex!? #39467 did this for the corresponding views of transposed matrices, although times there were a factor 4, while I get about 4/3 for this example.