I agree with @Lanfeng that you are wasting most of your time allocating the matrices [ones(1,d+1);vertices']
and [zeros(1,d);eye(d)]
, and in an inefficient way. The first matrix you even allocate twice, for no reason.
If you for some reason do not like keeping two matrices G1
and G2
around, and you know for sure the dimensions of vertices
, you can get pretty much the same speedup with this code:
function stima3b(vertices)
T = eltype(vertices)
G1 = T[1 1 1; 0 0 0; 0 0 0]
G2 = T[0 0; 1 0; 0 1]
fd = factorial(2) # or just "2"
G1[2:3, :] = vertices'
G = G1 \ G2
return (G * G') * (det(G1) / fd)
end
julia> @benchmark stima3(vert)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 3.33 kb
allocs estimate: 51
minimum time: 13.99 μs (0.00% GC)
median time: 16.69 μs (0.00% GC)
mean time: 17.42 μs (2.28% GC)
maximum time: 4.02 ms (99.00% GC)
julia> @benchmark stima3b(vert)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 1.84 kb
allocs estimate: 27
minimum time: 3.03 μs (0.00% GC)
median time: 3.26 μs (0.00% GC)
mean time: 3.65 μs (7.21% GC)
maximum time: 414.57 μs (97.47% GC)
Now, if you really want to speed things up, you should look into the package StaticArrays.jl
, which offers blazingly fast operations on small, fixed-size, arrays:
function stima3c{T}(vertices::SMatrix{3, 2, T})
G1 = @MMatrix T[1 1 1; 0 0 0; 0 0 0]
G2 = @SMatrix T[0 0; 1 0; 0 1]
fd = factorial(2)
G1[2:3, :] = vertices'
G = [G1\G2[:,1] G1\G2[:,2]] # StaticArrays does not support '\' on matrices
return (G * G') * (det(G1) / fd)
end
Here’s the benchmark for that:
julia> @benchmark stima3c(SMatrix{3,2}(vert))
BenchmarkTools.Trial:
samples: 10000
evals/sample: 911
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 176.00 bytes
allocs estimate: 3
minimum time: 114.00 ns (0.00% GC)
median time: 119.00 ns (0.00% GC)
mean time: 139.95 ns (9.73% GC)
maximum time: 2.96 μs (90.83% GC)
Thats >100x speedup over the original code, and you can still optimize further, e.g. by letting vert
be a SMatrix
in the first place, avoid transposition, and maybe allocating G1
and G2
outside the function.
I fact, I would consider whether to use StaticArrays.jl
throughout your code, since you can get dramatic speedups. Perhaps it is still a bit immature to let your entire codebase rely on it (don’t take my word for it), but you can at least use it here to fix your bottleneck.
One more note: Please use infix operators in the infix position. It makes your code so much easier to read Also, I don’t see any reason to use A_mul_Bt(G, G)
et. al. over G * G'
; the latter is much more readable, and should be as fast.