Strange performance results about function mul! with views

aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = [1]
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);


BLAS.get_num_threads() → 1
27.416 ms (3 allocations: 112 bytes)
BLAS.get_num_threads() → 48
27.514 ms (3 allocations: 112 bytes)

aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = 1:1
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);


BLAS.get_num_threads() → 1
11.358 ms (5 allocations: 224 bytes)
BLAS.get_num_threads() → 48
399.946 μs (5 allocations: 224 bytes)

range = 1:1 is much faster than range = [1].

@btime @view(aaaa[range,:].=bbbb*cccc;
is not such a case.

Why do the strange results occur?

Views with a Vector{Int} of indices are likely to hit slower paths than views with UnitRange, because their contents is less predictable:

julia> x = rand(5, 3);

julia> view(x, [1,3,2], :) isa StridedArray
false

julia> y = view(x, 1:3, :); y isa StridedArray
true

julia> strides(x) == strides(y) == (1, 5)
true

If 1:1 is what you really want, then there are probably further improvements available by using vector types, instead of 1xN matrices:

julia> @btime mul!(@view($aaaa[[1],:]), $bbbb, $cccc);  # slow path above
  24.499 ms (2 allocations: 64 bytes)

julia> @btime mul!(@view($aaaa[1:1,:]), $bbbb, $cccc);  # faster path above
  1.480 ms (0 allocations: 0 bytes)

julia> @btime mul!(@view($aaaa[1,:]), $cccc', $(vec(bbbb)));  # vector not matrix
  589.125 μs (0 allocations: 0 bytes)

3 Likes