Use an array of SMatrix
(from StaticArrays.jl), which is about 25× faster (update: 75× faster if I fix my type declaration) than your compute_inv
on my machine:
julia> using StaticArrays, BenchmarkTools
julia> A = rand(SMatrix{3,3,Float64}, 1000,4);
julia> @btime inv.($A);
143.487 μs (4002 allocations: 593.83 KiB)
For such small matrices, generic routines that work for any size of matrix have a lot of overhead. The advantage of StaticArrays is huge here because it lets you invoke an unrolled, optimized inversion routine specifically for 3×3 matrices.
(I’m still surprised that it is reporting 4002 allocations, however; not sure why it requires a heap allocation for each element.) Update: I should have used SMatrix{3,3,Float64,9}
as explained below.
(Note also that randn(1000,4,3,3)
puts the dimensions in the wrong order for locality — you probably want the 3x3 matrices to be contiguous in memory. Storing things as an array of StaticArrays gets you contiguity automatically.)