Strange performance results about function mul! with views

TongTongYee · April 29, 2025, 3:40pm

aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = [1]
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);

→
BLAS.get_num_threads() → 1
27.416 ms (3 allocations: 112 bytes)
BLAS.get_num_threads() → 48
27.514 ms (3 allocations: 112 bytes)

aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = 1:1
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);

→
BLAS.get_num_threads() → 1
11.358 ms (5 allocations: 224 bytes)
BLAS.get_num_threads() → 48
399.946 μs (5 allocations: 224 bytes)

range = 1:1 is much faster than range = [1].

@btime @view(aaaa[range,:].=bbbb*cccc;
is not such a case.

Why do the strange results occur?

mcabbott · April 29, 2025, 4:10pm

Views with a Vector{Int} of indices are likely to hit slower paths than views with UnitRange, because their contents is less predictable:

julia> x = rand(5, 3);

julia> view(x, [1,3,2], :) isa StridedArray
false

julia> y = view(x, 1:3, :); y isa StridedArray
true

julia> strides(x) == strides(y) == (1, 5)
true

If 1:1 is what you really want, then there are probably further improvements available by using vector types, instead of 1xN matrices:

julia> @btime mul!(@view($aaaa[[1],:]), $bbbb, $cccc);  # slow path above
  24.499 ms (2 allocations: 64 bytes)

julia> @btime mul!(@view($aaaa[1:1,:]), $bbbb, $cccc);  # faster path above
  1.480 ms (0 allocations: 0 bytes)

julia> @btime mul!(@view($aaaa[1,:]), $cccc', $(vec(bbbb)));  # vector not matrix
  589.125 μs (0 allocations: 0 bytes)

Topic		Replies	Views
Odd behavior with mul!() and @view General Usage	3	242	January 10, 2024
Batched matrix-multiplication optimization Performance performance , linearalgebra	15	350	May 30, 2025
Question on performance of views General Usage question , performance , views	4	340	September 12, 2024
Strange behavior of @views with reverse indexing General Usage bug	10	536	January 31, 2022
Memory Allocation when using mul! with sparse arrays and views Performance blas , linearalgebra , memory-allocation , sparsearrays	4	430	July 8, 2024

Strange performance results about function mul! with views

Related topics