Overhead of view / reshape / transpose in linear algebra

An example: reshape + view + ' for transpose + mul! will have some overhead.

using Random
using LinearAlgebra
using BenchmarkTools
buf = zeros(1000)
v1 = reshape(view(buf, 1:100), (20, 5))
v2 = view(v1, 1:16, :)
m3 = zeros(5, 5)
m4 = zeros(5, 16);
rand!(v2); rand!(m3);
m2 = zeros(16, 5); m2 .= v2

then test:

julia> @btime mul!(m4, m3, v2')
  249.301 ns (1 allocation: 112 bytes)
julia> @btime mul!(m4, m3, m2')
  246.672 ns (1 allocation: 16 bytes)
julia> @btime BLAS.gemm!('N', 'T', 1.0, m3, m2, 0.0, m4)
  223.786 ns (0 allocations: 0 bytes)
julia> @btime BLAS.gemm!('N', 'T', 1.0, m3, v2, 0.0, m4)
  220.938 ns (0 allocations: 0 bytes)

using BLAS.gemm! is nice, but mul! will allocate if ' is used for transpose, and when this happens, reshape + view will make the allocation much larger.

ps: When performing heavy calculations, Julia gc seems to be inadequate (sometimes memory peaks related to gc exist), and if we want to implement a memory pool on our own, reshape+view will arguably be the easiest way?

You’re seeing benchmarking artifacts. You want to flag the arguments as arguments with $:

julia> @btime mul!($m4, $m3, $v2');
  147.178 ns (0 allocations: 0 bytes)

julia> @btime mul!($m4, $m3, $m2');
  146.382 ns (0 allocations: 0 bytes)

julia> @btime BLAS.gemm!('N', 'T', 1.0, $m3, $m2, 0.0, $m4);
  142.447 ns (0 allocations: 0 bytes)

BenchmarkTools tries to evaluate the expression you passed (warts and all) as though it were plopped inside a function body. If you just have m2 without the $, then it’s using it as a type-unstable non-constant global access. The $ makes it behave as though it was passed as an argument to the function body.

1 Like

Thanks for correction, I’ve met this problem in practice and try to find a MWE. Obviously I’ve made a mistake and this is not a MWE. I’ll accept your answer for present and reopen the issue if I would find one later.