The constant factor of matrix–vector is only slightly better than left-division by a factorization, so (as I explained above) you have to be solving a lot of right-hand sides (proportional to the size of the matrix) before computing the matrix inversion is worth it.
For example, with a 1000 \times 1000 matrix:
julia> using BenchmarkTools, LinearAlgebra
julia> LinearAlgebra.BLAS.set_num_threads(1) # single-threaded benchmarking is simpler
julia> A = rand(1000,1000); b = rand(1000); x = similar(b);
julia> F = @btime lu($A);
18.455 ms (6 allocations: 7.64 MiB)
julia> A⁻¹ = @btime inv($A);
54.006 ms (10 allocations: 8.13 MiB)
julia> @btime ldiv!($x, $F, $b);
203.052 μs (0 allocations: 0 bytes)
julia> @btime mul!($x, $A⁻¹, $b);
159.604 μs (0 allocations: 0 bytes)
So, on my machine, you’d have to be solving around 800 right-hand sides (one by one) for the cost of a 1000x1000 matrix inversion to break even.
If you are solving lots of right hand sides at once, i.e. you are computing X = A^{-1} B where B is a matrix, then you need a lot more to hit the break-even point. On my machine the break-even comes at about 2700 simultaneous right-hand sides for a 1000x1000 matrix:
julia> n = 2700; B = rand(1000,n); X = similar(B);
julia> @btime ldiv!($B, $F, $B);
131.404 ms (4 allocations: 20.60 MiB)
julia> @btime mul!($X, $A⁻¹, $B);
95.540 ms (0 allocations: 0 bytes)
(so that the performance loss of ldiv!
vs mul!
cancels the performance loss of inv
vs lu
. Together, they both take about 150ms in total.)
PS. Obviously, if the matrix is tiny, e.g. 2x2 or 3x3, then the constant factors are completely different, especially if you are using StaticArrays.jl.