Linear algebra performance issue

Hello all,

I need to compute a 3x3 matrix many many times, it is given by P = J x sigma x inv(F)‘, where J is a scalar, sigma and F are 3x3 matrices; A’ => transpose of A.

What is the best way to compute this P as fast as possible in Julia?

Since the matrix size is very small, consider using SMatrix{3,3} from StaticArrays.jl

1 Like

Thanks, I did use that.

@timeit "2"            P     = J*sigma*inv(F)'  # convert to Piola Kirchoof stress

But the above way of computing P involves many allocation, as measured by TimerOutput.jl.

If you use StaticArrays it should not allocate at all, if it does, there might be some type-stability issue in your code. It’s hard to tell without having an example to run.

1 Like

No temporary allocations at all? If so, I will double check with @code_warntype. Thanks.

Hmm, it does appear to allocate a small amount, I’m not sure why

julia> a
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
  0.873545   1.1805   2.00785
  1.26515    2.59622  0.654212
 -1.51945   -1.70419  0.517809

julia> b
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
  1.5845     1.17958     1.72889
 -0.974253  -1.15078    -0.647302
  1.12172    0.0169034   0.0549179

julia> @time sum(2*a/b)
  0.000004 seconds (3 allocations: 176 bytes)
0.2072010681879124

EDIT: sorry, my benchmarking was flawed, it does not allocate anything

julia> @btime sum(2*$a/$b')
  26.254 ns (0 allocations: 0 bytes)
1 Like

Thanks. I confirm your answer. No allocation.

Actually, I got allocations because one of these matrices is MMatrix, not SMatrix. The reason is:

sigma::MMatrix{Float64,3,3}
function modify_stress!(sigma, epsilon::SMatrix{3,3,Float64})
  sigma    .= (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) 
end

Then, sigma got modified by function modify_stress!().
The thing is I cannot use SMatrix for sigma because it does not allow: sigma .= … By writing sigma = (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.), sigma did not get modified!!!

How could I solve this dilemma?

You do not need to update sigma in place when you use static arrays, simply return the new one

function stress(epsilon::SMatrix{3,3,Float64})
  (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) 
end
1 Like

I’m confused. Does this work for you? For me it gives an error:

ERROR: MethodError: no method matching length(::UniformScaling{Float64})

What’s the output?

(BTW, are you aware of the tr function? You can use tr(epsilon) instead of epsilon[1,1]+epsilon[2,2]+epsilon[3,3])

This works for me:

sigma .= tr(epsilon) * I(3)

But you should probably use SMatrix and just replace sigma, as @baggepinnen suggests.

Note that you probably want to solve P=J*sigma \ (F') as it will be likely both faster and more accurate.

To elaborate on this, it is usually better to write things in a functional way when dealing with StaticArrays. So instead of having modify_stress! where you pass in a return value you simply have compute_stress and you only pass in the strain. This makes the code less stateful and you pass less arguments to function and it usually get significantly faster.

Since I see you are doing continuum mechanics, you could also take a look at GitHub - Ferrite-FEM/Tensors.jl: Efficient computations with symmetric and non-symmetric tensors with support for automatic differentiation..

3 Likes

Yes, it works for me, my old code is:

function update_stress!(sigma::SMatrix{3,3,Float64},
						epsilon::SMatrix{3,3,Float64})
  #sigma    .= mat.lambda * (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) + 2.0 * mat.mu * epsilon

I did not know of the tr() function. Thanks.

Thanks a lot Kristoffer.

Great, will try that. thanks.

HI Kristogger,

Tensor.jl is pretty cool. I have may be a stupid question, for 3x3 matrices, Tensor.jl is faster than StaticArrays.jl? I did some tests and the speed seems to be the same. But I want to ask your expert opinion.

Should be pretty much the same overall. For 3x3 it might be faster in scalar products with other second order tensors because we use explicit SIMD instructions whereas StaticArrays.jl need help from LLVM (and LLVM is bad for the 3x3 case)

1 Like

Carlsson, could u please help with this code?

struct Solid3D_1
        F :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        ϵ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        σ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        parcount::Int64
        function Solid3D_1(count)
            strain    = fill(zeros(3,3),count)
            defor     = fill(zeros(3,3),count)
            stress    = fill(zeros(3,3),count)
            return new(defor,strain,stress,count)
        end
end

struct Solid3D_2
        F :: Vector{Tensor{2,3,Float64}}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
        ϵ :: Vector{SymmetricTensor{2,3,Float64}}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
        σ :: Vector{SymmetricTensor{2,3,Float64}}  # F, 2x2 matrix
        parcount::Int64
        function Solid3D_2(count)
            defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
            strain    = zeros(SymmetricTensor{2, 3},count)
            stress    = zeros(SymmetricTensor{2, 3},count)
            return new(defor,strain,stress,count)
        end
end

function main_static_arrays(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
        end
end

function main_tensor(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            #J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            #P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
        end
end

function main()
parCount = 100000
s1 = Solid3D_1(parCount)
s2 = Solid3D_2(parCount)
@btime main_static_arrays(s1)
@btime main_tensor(s2)
end

main()

I got:

julia> include("Main_Test_Tensor.jl")
  48.933 μs (0 allocations: 0 bytes)
  3.037 ms (100000 allocations: 7.63 MiB)

Why?

From the docs example (https://kristofferc.github.io/Tensors.jl/stable/man/constructing_tensors/#Constructing-tensors-1), it looks like SymmetricTensor{order,dim,T} is incomplete type specification, so that the elements of Solid3D_2 are vectors with abstract element types. The complete type signature must be F :: Vector{Tensor{2,3,Float64,9}} and ϵ :: Vector{SymmetricTensor{2,3,Float64,6}}, I guess.
Try the following definition:

struct Solid3D_2{T<:Tensor{2,3,<:AbstractFloat}, ST<:SymmetricTensor{2,3,<:AbstractFloat}, IT<:Integer}
        F :: Vector{T}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
        ϵ :: Vector{ST}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
        σ :: Vector{ST}  # F, 2x2 matrix
        parcount::IT
        function Solid3D_2(count)
            defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
            strain    = zeros(SymmetricTensor{2, 3},count)
            stress    = zeros(SymmetricTensor{2, 3},count)
            T, ST = eltype.(defor, strain) 
            return new{T,ST,Int}(defor,strain,stress,count)
        end
end

Thanks Vasily, I used your suggestion, but not a parameterised struct:

function main_static_arrays(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
            L             = Finv*Finv 
            D             = 0.5  * (L + L')  # no memory alloc
        end
end

function main_tensor(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
            L             = Finv⋅Finv 
            D             = symmetric(L)  # no memory alloc
        end
end

function main1()
    parCount = 100000
    s1       = Solid3D_1(parCount)
    @btime main_static_arrays($s1)
end

function main2()
    parCount = 100000
    s2       = Solid3D_2(parCount)
    @btime main_tensor($s2)
end

main1()
main2()
julia> include("Main_Test_Tensor.jl")
  51.429 μs (0 allocations: 0 bytes)
  77.122 μs (0 allocations: 0 bytes)

Seems to me symmetric(A) is slow as I removed it, Tensors.jl was faster slightly.