What is the quickest way to move a subsection of a matrix?

For example copy the second two columns to the first:

[1 2 3;
 1 2 3;
 1 2 3]


[2 3 3;
 2 3 3;
 2 3 3]

How about m[:, 1:2] .= view(m, :, 2:3)?

Correction: copyto!(m, CartesianIndices((1:3, 1:2)), m, CartesianIndices((1:3,2:3))) is faster:

julia> @btime copyto!($m, CartesianIndices((1:3, 1:2)), $m, CartesianIndices((1:3,2:3)));
  39.121 ns (1 allocation: 160 bytes)

julia> f(m) = m[:,1:2] .= view(m, :, 2:3); @btime f($m);
  68.940 ns (4 allocations: 320 bytes)

It is never clear to me whether the two regions of copyto! or .= can overlap, so I generally avoid this.

1 Like

Interesting! I believe copyto! takes care of unaliasing source and destination. Can you give an example where overlaps are a problem?

I think you misunderstand: since I don’t see this documented, this is something I don’t think I can rely on. Whether this works or not is not relevant, since if this is not part of the interface then it could silently stop working or introduce incorrect results.

Both the view and copyto! solutions suggested above allocate. Try this to avoid allocations:

copyto!(m, 1, m, 1 + size(m,1), (size(m,2) - 1) * size(m,1))

Or use a for loop.

1 Like

I totally get that, and I fully understand the concern: how can you copy overlapping memory portions without making an extra copy first? I’m actually not sure how copyto! does it. However I’d point out that the documentation of copyto! does not specify that source and destination have to be different, right?

@bennedich: nice one! Pity it only works for full columns, unlike the CartesianIndices approach.

I think it would be worthwhile to open an issue about clarifying this. I get the impression that it is supposed to work, eg

Also, currently copyto! uses the memmove C function for this, which does handle overlap.

1 Like

Yes, for other ranges, I’d just consider a for loop. Either loop over each individual element, or a loop of copyto!. It’s a bit frustrating how it’s sometimes so hard to avoid allocations for simple operations like this in Julia.

1 Like

Thanks everyone. I should have specified strictly non-allocating methods! Copyto seems like the way to go, I was imagining something like memmove existed but couldn’t find it.

Is it clear why copyto! with CartesianIndices allocates, unlike a simple for? It’s using a loop itself

Hey again,

I’ve checked that the bit that allocates when using copyto! with CartesianIndices is precisely the unalias part, which in this case needs to make a copy of m, I guess. The interesting question is then how can memmove (which gets called directly when using @bennedich’s copyto! solution) avoid any allocation, and whether we could do something similar, and avoid calling Base.unalias. I suspect it’s not completely trivial, as the critical difference is that memmove can exploit the fact that the memory chunk to be moved is contiguous, so it’s easy to avoid a copy, but when using CartesianIndices it’s a little bit more involved, I guess.

See also Understanding performance using `@btime` and `@code_warntype`, `@code_llvm`, etc, which appears to apply here.

You mean this?

julia> @btime copyto!($m, CartesianIndices((1:3, 1:2)), $m, CartesianIndices((1:3,2:3)));
  42.432 ns (1 allocation: 160 bytes)

julia> g(m) = copyto!(m, CartesianIndices((1:3, 1:2)), m, CartesianIndices((1:3, 2:3)))
g (generic function with 1 method)

julia> @btime g($m);
  42.460 ns (1 allocation: 160 bytes)

No, I mean the size check inside copyto! itself uses string interpolation. Factor that out into a separate function and it might get faster.

Ah, right! No, for some reason this doesn’t seem to allocate anything extra (in v1.1 at least)

julia> function risky_copyto!(dest::AbstractArray{T1,2}, Rdest::CartesianIndices{2},
                         src::AbstractArray{T2,2}, Rsrc::CartesianIndices{2}) where {T1,T2}
           ΔI = first(Rdest) - first(Rsrc)
           src′ = Base.unalias(dest, src)
           for I in Rsrc
               @inbounds dest[I + ΔI] = src′[I]

julia> g(m) = risky_copyto!(m, CartesianIndices((1:3, 1:2)), m, CartesianIndices((1:3, 2:3)))
g (generic function with 1 method)

julia> @btime g($m);
  37.906 ns (1 allocation: 160 bytes)

Right, the allocation came from the unalias, but I was wondering if the string interpolation might nevertheless be measurable.

It does seem that perhaps one could avoid the unalias call by using a branch that checks whether the first element of Rdest is within Rsrc; if so, do the copy in order of Iterators.reverse(Rsrc).


you may be can do like this:

julia> A=[1 2 3;1 2 3;1 2 3];

julia> B1=circshift(A,(0,2));

julia> B=hcat(B1[:,1:2],B1[:,2])
3×3 Array{Int64,2}:
 2  3  3
 2  3  3
 2  3  3

For unalias allocating: https://github.com/JuliaLang/julia/pull/26237.