# Linear algebra performance issue

Hello all,

I need to compute a 3x3 matrix many many times, it is given by P = J x sigma x inv(F)’, where J is a scalar, sigma and F are 3x3 matrices; A’ => transpose of A.

What is the best way to compute this P as fast as possible in Julia?

Since the matrix size is very small, consider using `SMatrix{3,3}` from StaticArrays.jl

1 Like

Thanks, I did use that.

``````@timeit "2"            P     = J*sigma*inv(F)'  # convert to Piola Kirchoof stress
``````

But the above way of computing P involves many allocation, as measured by TimerOutput.jl.

If you use StaticArrays it should not allocate at all, if it does, there might be some type-stability issue in your code. It’s hard to tell without having an example to run.

1 Like

No temporary allocations at all? If so, I will double check with @code_warntype. Thanks.

Hmm, it does appear to allocate a small amount, I’m not sure why

``````julia> a
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
0.873545   1.1805   2.00785
1.26515    2.59622  0.654212
-1.51945   -1.70419  0.517809

julia> b
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
1.5845     1.17958     1.72889
-0.974253  -1.15078    -0.647302
1.12172    0.0169034   0.0549179

julia> @time sum(2*a/b)
0.000004 seconds (3 allocations: 176 bytes)
0.2072010681879124
``````

EDIT: sorry, my benchmarking was flawed, it does not allocate anything

``````julia> @btime sum(2*\$a/\$b')
26.254 ns (0 allocations: 0 bytes)
``````
1 Like

Thanks. I confirm your answer. No allocation.

Actually, I got allocations because one of these matrices is MMatrix, not SMatrix. The reason is:

``````sigma::MMatrix{Float64,3,3}
function modify_stress!(sigma, epsilon::SMatrix{3,3,Float64})
sigma    .= (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.)
end
``````

Then, sigma got modified by function modify_stress!().
The thing is I cannot use SMatrix for sigma because it does not allow: sigma .= … By writing sigma = (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.), sigma did not get modified!!!

How could I solve this dilemma?

You do not need to update sigma in place when you use static arrays, simply return the new one

``````function stress(epsilon::SMatrix{3,3,Float64})
(epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.)
end
``````
1 Like

I’m confused. Does this work for you? For me it gives an error:

``````ERROR: MethodError: no method matching length(::UniformScaling{Float64})
``````

What’s the output?

(BTW, are you aware of the `tr` function? You can use `tr(epsilon)` instead of `epsilon[1,1]+epsilon[2,2]+epsilon[3,3]`)

This works for me:

``````sigma .= tr(epsilon) * I(3)
``````

But you should probably use `SMatrix` and just replace `sigma`, as @baggepinnen suggests.

Note that you probably want to solve `P=J*sigma \ (F')` as it will be likely both faster and more accurate.

To elaborate on this, it is usually better to write things in a functional way when dealing with StaticArrays. So instead of having `modify_stress!` where you pass in a return value you simply have `compute_stress` and you only pass in the strain. This makes the code less stateful and you pass less arguments to function and it usually get significantly faster.

Since I see you are doing continuum mechanics, you could also take a look at https://github.com/KristofferC/Tensors.jl.

3 Likes

Yes, it works for me, my old code is:

``````function update_stress!(sigma::SMatrix{3,3,Float64},
epsilon::SMatrix{3,3,Float64})
#sigma    .= mat.lambda * (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) + 2.0 * mat.mu * epsilon
``````

I did not know of the tr() function. Thanks.

Thanks a lot Kristoffer.

Great, will try that. thanks.

HI Kristogger,

Tensor.jl is pretty cool. I have may be a stupid question, for 3x3 matrices, Tensor.jl is faster than StaticArrays.jl? I did some tests and the speed seems to be the same. But I want to ask your expert opinion.

Should be pretty much the same overall. For 3x3 it might be faster in scalar products with other second order tensors because we use explicit SIMD instructions whereas StaticArrays.jl need help from LLVM (and LLVM is bad for the 3x3 case)

1 Like

``````struct Solid3D_1
F :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
ϵ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
σ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
parcount::Int64
function Solid3D_1(count)
strain    = fill(zeros(3,3),count)
defor     = fill(zeros(3,3),count)
stress    = fill(zeros(3,3),count)
return new(defor,strain,stress,count)
end
end

struct Solid3D_2
F :: Vector{Tensor{2,3,Float64}}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
ϵ :: Vector{SymmetricTensor{2,3,Float64}}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
σ :: Vector{SymmetricTensor{2,3,Float64}}  # F, 2x2 matrix
parcount::Int64
function Solid3D_2(count)
defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
strain    = zeros(SymmetricTensor{2, 3},count)
stress    = zeros(SymmetricTensor{2, 3},count)
return new(defor,strain,stress,count)
end
end

function main_static_arrays(solid)
defor = solid.F
epsi  = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J             = det(defor[ip])
Finv          = inv(defor[ip])    # no memory alloc
P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
end
end

function main_tensor(solid)
defor = solid.F
epsi  = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
#J             = det(defor[ip])
Finv          = inv(defor[ip])    # no memory alloc
#P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
end
end

function main()
parCount = 100000
s1 = Solid3D_1(parCount)
s2 = Solid3D_2(parCount)
@btime main_static_arrays(s1)
@btime main_tensor(s2)
end

main()
``````

I got:

``````julia> include("Main_Test_Tensor.jl")
48.933 μs (0 allocations: 0 bytes)
3.037 ms (100000 allocations: 7.63 MiB)
``````

Why?

From the docs example (https://kristofferc.github.io/Tensors.jl/stable/man/constructing_tensors/#Constructing-tensors-1), it looks like `SymmetricTensor{order,dim,T}` is incomplete type specification, so that the elements of `Solid3D_2` are vectors with abstract element types. The complete type signature must be `F :: Vector{Tensor{2,3,Float64,9}}` and `ϵ :: Vector{SymmetricTensor{2,3,Float64,6}}`, I guess.
Try the following definition:

``````struct Solid3D_2{T<:Tensor{2,3,<:AbstractFloat}, ST<:SymmetricTensor{2,3,<:AbstractFloat}, IT<:Integer}
F :: Vector{T}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
ϵ :: Vector{ST}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
σ :: Vector{ST}  # F, 2x2 matrix
parcount::IT
function Solid3D_2(count)
defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
strain    = zeros(SymmetricTensor{2, 3},count)
stress    = zeros(SymmetricTensor{2, 3},count)
T, ST = eltype.(defor, strain)
return new{T,ST,Int}(defor,strain,stress,count)
end
end
``````

Thanks Vasily, I used your suggestion, but not a parameterised struct:

``````function main_static_arrays(solid)
defor = solid.F
epsi  = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J             = det(defor[ip])
Finv          = inv(defor[ip])    # no memory alloc
P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
L             = Finv*Finv
D             = 0.5  * (L + L')  # no memory alloc
end
end

function main_tensor(solid)
defor = solid.F
epsi  = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J             = det(defor[ip])
Finv          = inv(defor[ip])    # no memory alloc
P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
L             = Finv⋅Finv
D             = symmetric(L)  # no memory alloc
end
end

function main1()
parCount = 100000
s1       = Solid3D_1(parCount)
@btime main_static_arrays(\$s1)
end

function main2()
parCount = 100000
s2       = Solid3D_2(parCount)
@btime main_tensor(\$s2)
end

main1()
main2()
``````
``````julia> include("Main_Test_Tensor.jl")
51.429 μs (0 allocations: 0 bytes)
77.122 μs (0 allocations: 0 bytes)
``````

Seems to me symmetric(A) is slow as I removed it, Tensors.jl was faster slightly.