Avoiding allocations from views: copyto! between arrays of different shape


#1

Say I have an n x m Matrix, and an n-length Vector that I want to copy to, say, column 2 of my matrix. My first try was to do copyto! using CartesianIndices.

m = rand(3,3)
v = rand(3)
copyto!(m, CartesianIndices((1:3, 2:2)), v, CartesianIndices(1:3))

But that fails. copyto! only works to copy a slice of an AbstractArray{T1,N} to another slice of an AbstractArray{T2,N}, i.e. between tensors with the same number of indices, despite the API very elegantly allows to express the more general case with N1 != N2 as above. What is the proper way to do this (for arbitrary N1, N2) in v1.0? Is there a reason why we would not want a more general copyto!?


#2

Something like the following (slight modification of the version in Base) seems to work for me. It’s such a minor modification that I suspect somebody had a good reason not to do it…

function copyto!(dest::AbstractArray{T1,N1}, Rdest::CartesianIndices{N1}, src::AbstractArray{T2,N2}, Rsrc::CartesianIndices{N2}) where {T1,T2,N1,N2}
    checkbounds(dest, first(Rdest))
    checkbounds(dest, last(Rdest))
    checkbounds(src, first(Rsrc))
    checkbounds(src, last(Rsrc))
    src′ = unalias(dest, src)
    for (Is, Id) in zip(Rsrc, Rdest)
        @inbounds dest[Id] = src′[Is]
    end
    return dest
end

#3

I might not understand your problem correctly, but is there a reason why e.g. X[2,:] .= Y does not work for you?


#4

No, that works perfectly well, and it’s probably the truly Julian approach. The (small) problem is that, if I understand correctly, that approach allocates a view, and is therefore somewhat slower than a direct copy. Until we have something like #14955 I prefer to avoid views in my particular application. But thanks for pointing that out!


#5

To illustrate

a = rand(3,3); b = rand(3,3);
f1(a,b) = a[1:2, 2:3] .= b[2:3, 1:2]
f2(a,b) = copyto!(a, CartesianIndices((1:2, 2:3)), b, CartesianIndices((2:3, 1:2)))

on v1.0.1

julia> @btime f1($a,$b);
  61.395 ns (2 allocations: 176 bytes)
julia> @btime f2($a,$b);
  15.850 ns (0 allocations: 0 bytes)

#6

Note, that it only allocates a view of you have a slice on the rhs of the assignment. For copying a full vector (or matrix), as in your original example, it should be efficient.


#7

Mmm, are you sure that’s always true? The following f1 also seems to allocate a view, although indeed the performance is now much closer to f2

a = rand(3,3); b = rand(2,2);
f1(a,b) = a[1:2, 2:3] .= b
f2(a,b) = copyto!(a, CartesianIndices((1:2, 2:3)), b, CartesianIndices((1:2, 1:2)))
julia> @btime f1($a,$b)
  22.696 ns (1 allocation: 64 bytes)
2×2 view(::Array{Float64,2}, 1:2, 2:3) with eltype Float64:
 0.408094   0.413918
 0.0414806  0.610879

julia> @btime f2($a,$b)
  17.146 ns (0 allocations: 0 bytes)
3×3 Array{Float64,2}:
 0.8406     0.408094   0.413918
 0.664731   0.0414806  0.610879
 0.0776809  0.810417   0.810417

#8

I’m sorry, what I said isn’t the full story. it is still creating a view for the lhs. Only no slices at all should be equally fast. (I’m on phone)


#9

Even in the case of fully overwriting a by b the copyto! approach is a bit faster:

julia> f2(a,b) = copyto!(a, CartesianIndices((1:3, 1:3)), b, CartesianIndices((1:3, 1:3)))
f2 (generic function with 1 method)

julia> f1(a,b) = a.=b;

julia> @btime f1($a, $b);
  17.313 ns (0 allocations: 0 bytes)

julia> @btime f2($a, $b);
  20.756 ns (0 allocations: 0 bytes)

I see no reason not to have a more general copyto!.


#10

Ref https://github.com/JuliaLang/julia/pull/29868.


#11

Your numbers suggest the opposite?


#12

Wow, wait, what? Did you just elide the view allocation with @inline? Does it also work with a slice on the right side?


#13

That PR makes the RHS slice a bit better:

julia> a = rand(3,3); b = rand(3,3);

julia> f1(a,b) = (a[1:2, 2:3] .= b[2:3, 1:2]; nothing)

Before

julia> @btime f1($a,$b);
  79.742 ns (2 allocations: 176 bytes)

After

julia> @btime f1($a,$b);
  64.168 ns (1 allocation: 112 bytes)

There is still an allocation happening because equality between AbstractArray and SubArray is not inlined.

Actually I am not sure why.


#14

Apparently I confused myself with the order :slight_smile:


#15

It sure would be great if could work as efficiently as copyto!. Much simpler, and nicer, I would gladly use .= always and stop worrying about hidden allocations.