Is it worth trying to speed up setindex for arrays of numbers where an array is overwritten with another?

jishnub · June 10, 2021, 10:05am

Arrays of numbers (often bitstype) are fairly common in numerical work, eg Array{Float64}.

I’m thinking of an operation such as

julia> a = ones(1000, 1000);

julia> b = copy(a);

julia> @btime $a[:,:] = $b;
  1.877 ms (0 allocations: 0 bytes)

julia> @btime copyto!($a, $b);
  985.943 μs (0 allocations: 0 bytes)

# It's not bounds checking either that's slowing things down, perhaps vectorization?
julia> f(a, b) = @inbounds a[:,:] = b;

julia> @btime f($a, $b);
  1.870 ms (0 allocations: 0 bytes)

# broadcasted setindex with a view is faster, but still much slower than copyto
julia> @btime $a[:,:] .= $b;
  1.572 ms (0 allocations: 0 bytes)

# direct inplace broadcasting is as fast as copyto!
julia> @btime $a .= $b;
  983.905 μs (0 allocations: 0 bytes)

The setindex! in the first case is equivalent to the broadcasted version, but the latter is doubly fast. There are instances of the opposite too, eg.

julia> @btime $a[1:size($a,1), 1:size($a,2)] = $b;
  1.873 ms (0 allocations: 0 bytes)

julia> @btime $a[1:size($a,1), 1:size($a,2)] .= $b;
  3.982 ms (0 allocations: 0 bytes)

The difference here is that the view is a SlowSubArray in this case while it was a FastSubArray in the former.

Does it make sense to identify such cases where a potentially slow operation may be replaced by a faster one, and use the faster implementation instead? I’m not sure how generic this will be, however this might improve the performance in common applications. Otherwise it requires the user to retain a list of which operation is faster in each scenario to obtain optimal performance.

Ideally all these operations would behave identically, but I’m not sure if it’s easy to get there.

I had posted an issue about this a which seemingly didn’t get much attention, so I thought about discussing this here.

Topic		Replies	Views
Why is copying using a loop is much slower than `copy` for large arrays? Performance question , copy	10	1398	November 27, 2022
Broadcasting `setindex!` over a tuple of arrays with splatted indices is slow General Usage	1	640	March 11, 2018
Broadcast of .== slow performance, allocations Performance question , performance	5	792	May 16, 2019
Why is there a performance hit on broadcasting with OffsetArrays? Performance question	3	1029	December 15, 2019
Strange behavior with array calls General Usage question , performance	8	493	January 2, 2019

Is it worth trying to speed up setindex for arrays of numbers where an array is overwritten with another?

Related topics