aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = [1]
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);
→
BLAS.get_num_threads() → 1
27.416 ms (3 allocations: 112 bytes)
BLAS.get_num_threads() → 48
27.514 ms (3 allocations: 112 bytes)
aaaa = zeros(1000,1000)
bbbb = ones(1,10000)
cccc = ones(10000,1000)
range = 1:1
@btime mul!(@view(aaaa[range,:]),bbbb,cccc);
→
BLAS.get_num_threads() → 1
11.358 ms (5 allocations: 224 bytes)
BLAS.get_num_threads() → 48
399.946 μs (5 allocations: 224 bytes)
range = 1:1 is much faster than range = [1].
@btime @view(aaaa[range,:].=bbbb*cccc;
is not such a case.
Why do the strange results occur?
Views with a Vector{Int}
of indices are likely to hit slower paths than views with UnitRange
, because their contents is less predictable:
julia> x = rand(5, 3);
julia> view(x, [1,3,2], :) isa StridedArray
false
julia> y = view(x, 1:3, :); y isa StridedArray
true
julia> strides(x) == strides(y) == (1, 5)
true
If 1:1
is what you really want, then there are probably further improvements available by using vector types, instead of 1xN matrices:
julia> @btime mul!(@view($aaaa[[1],:]), $bbbb, $cccc); # slow path above
24.499 ms (2 allocations: 64 bytes)
julia> @btime mul!(@view($aaaa[1:1,:]), $bbbb, $cccc); # faster path above
1.480 ms (0 allocations: 0 bytes)
julia> @btime mul!(@view($aaaa[1,:]), $cccc', $(vec(bbbb))); # vector not matrix
589.125 μs (0 allocations: 0 bytes)
3 Likes