Linear algebra performance issue

vinhphunguyen · July 1, 2020, 6:48am

Hello all,

I need to compute a 3x3 matrix many many times, it is given by P = J x sigma x inv(F)‘, where J is a scalar, sigma and F are 3x3 matrices; A’ => transpose of A.

What is the best way to compute this P as fast as possible in Julia?

baggepinnen · July 1, 2020, 7:07am

Since the matrix size is very small, consider using SMatrix{3,3} from StaticArrays.jl

vinhphunguyen · July 1, 2020, 7:12am

Thanks, I did use that.

@timeit "2"            P     = J*sigma*inv(F)'  # convert to Piola Kirchoof stress

But the above way of computing P involves many allocation, as measured by TimerOutput.jl.

baggepinnen · July 1, 2020, 7:14am

If you use StaticArrays it should not allocate at all, if it does, there might be some type-stability issue in your code. It’s hard to tell without having an example to run.

vinhphunguyen · July 1, 2020, 7:20am

No temporary allocations at all? If so, I will double check with @code_warntype. Thanks.

baggepinnen · July 1, 2020, 7:24am

Hmm, ~~it does appear to allocate a small amount, I’m not sure why~~

julia> a
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
  0.873545   1.1805   2.00785
  1.26515    2.59622  0.654212
 -1.51945   -1.70419  0.517809

julia> b
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
  1.5845     1.17958     1.72889
 -0.974253  -1.15078    -0.647302
  1.12172    0.0169034   0.0549179

julia> @time sum(2*a/b)
  0.000004 seconds (3 allocations: 176 bytes)
0.2072010681879124

EDIT: sorry, my benchmarking was flawed, it does not allocate anything

julia> @btime sum(2*$a/$b')
  26.254 ns (0 allocations: 0 bytes)

vinhphunguyen · July 1, 2020, 7:41am

Thanks. I confirm your answer. No allocation.

vinhphunguyen · July 1, 2020, 9:35am

Actually, I got allocations because one of these matrices is MMatrix, not SMatrix. The reason is:

sigma::MMatrix{Float64,3,3}
function modify_stress!(sigma, epsilon::SMatrix{3,3,Float64})
  sigma    .= (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) 
end

Then, sigma got modified by function modify_stress!().
The thing is I cannot use SMatrix for sigma because it does not allow: sigma .= … By writing sigma = (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.), sigma did not get modified!!!

How could I solve this dilemma?

baggepinnen · July 1, 2020, 9:41am

You do not need to update sigma in place when you use static arrays, simply return the new one

function stress(epsilon::SMatrix{3,3,Float64})
  (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) 
end

DNF · July 1, 2020, 10:55am

I’m confused. Does this work for you? For me it gives an error:

ERROR: MethodError: no method matching length(::UniformScaling{Float64})

What’s the output?

(BTW, are you aware of the tr function? You can use tr(epsilon) instead of epsilon[1,1]+epsilon[2,2]+epsilon[3,3])

This works for me:

sigma .= tr(epsilon) * I(3)

But you should probably use SMatrix and just replace sigma, as @baggepinnen suggests.

Oscar_Smith · July 1, 2020, 3:51pm

Note that you probably want to solve P=J*sigma \ (F') as it will be likely both faster and more accurate.

kristoffer.carlsson · July 1, 2020, 3:59pm

To elaborate on this, it is usually better to write things in a functional way when dealing with StaticArrays. So instead of having modify_stress! where you pass in a return value you simply have compute_stress and you only pass in the strain. This makes the code less stateful and you pass less arguments to function and it usually get significantly faster.

Since I see you are doing continuum mechanics, you could also take a look at GitHub - Ferrite-FEM/Tensors.jl: Efficient computations with symmetric and non-symmetric tensors with support for automatic differentiation..

vinhphunguyen · July 3, 2020, 4:36pm

Yes, it works for me, my old code is:

function update_stress!(sigma::SMatrix{3,3,Float64},
						epsilon::SMatrix{3,3,Float64})
  #sigma    .= mat.lambda * (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) + 2.0 * mat.mu * epsilon

I did not know of the tr() function. Thanks.

vinhphunguyen · July 3, 2020, 4:38pm

Thanks a lot Kristoffer.

vinhphunguyen · July 3, 2020, 4:39pm

Great, will try that. thanks.

vinhphunguyen · July 4, 2020, 3:01am

HI Kristogger,

Tensor.jl is pretty cool. I have may be a stupid question, for 3x3 matrices, Tensor.jl is faster than StaticArrays.jl? I did some tests and the speed seems to be the same. But I want to ask your expert opinion.

kristoffer.carlsson · July 5, 2020, 5:25pm

Should be pretty much the same overall. For 3x3 it might be faster in scalar products with other second order tensors because we use explicit SIMD instructions whereas StaticArrays.jl need help from LLVM (and LLVM is bad for the 3x3 case)

vinhphunguyen · July 10, 2020, 9:45am

Carlsson, could u please help with this code?

struct Solid3D_1
        F :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        ϵ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        σ :: Vector{SMatrix{3,3,Float64,9}}  # F, 2x2 matrix
        parcount::Int64
        function Solid3D_1(count)
            strain    = fill(zeros(3,3),count)
            defor     = fill(zeros(3,3),count)
            stress    = fill(zeros(3,3),count)
            return new(defor,strain,stress,count)
        end
end

struct Solid3D_2
        F :: Vector{Tensor{2,3,Float64}}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
        ϵ :: Vector{SymmetricTensor{2,3,Float64}}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
        σ :: Vector{SymmetricTensor{2,3,Float64}}  # F, 2x2 matrix
        parcount::Int64
        function Solid3D_2(count)
            defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
            strain    = zeros(SymmetricTensor{2, 3},count)
            stress    = zeros(SymmetricTensor{2, 3},count)
            return new(defor,strain,stress,count)
        end
end

function main_static_arrays(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
        end
end

function main_tensor(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            #J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            #P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
        end
end

function main()
parCount = 100000
s1 = Solid3D_1(parCount)
s2 = Solid3D_2(parCount)
@btime main_static_arrays(s1)
@btime main_tensor(s2)
end

main()

I got:

julia> include("Main_Test_Tensor.jl")
  48.933 μs (0 allocations: 0 bytes)
  3.037 ms (100000 allocations: 7.63 MiB)

Why?

Vasily_Pisarev · July 10, 2020, 10:08am

From the docs example (https://kristofferc.github.io/Tensors.jl/stable/man/constructing_tensors/#Constructing-tensors-1), it looks like SymmetricTensor{order,dim,T} is incomplete type specification, so that the elements of Solid3D_2 are vectors with abstract element types. The complete type signature must be F :: Vector{Tensor{2,3,Float64,9}} and ϵ :: Vector{SymmetricTensor{2,3,Float64,6}}, I guess.
Try the following definition:

struct Solid3D_2{T<:Tensor{2,3,<:AbstractFloat}, ST<:SymmetricTensor{2,3,<:AbstractFloat}, IT<:Integer}
        F :: Vector{T}           # Tensor{order,dim,T<:Real} F, 2x2 matrix
        ϵ :: Vector{ST}  # Tensor{order,dim,T<:Real} F, 2x2 matrix
        σ :: Vector{ST}  # F, 2x2 matrix
        parcount::IT
        function Solid3D_2(count)
            defor    = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}),         count)
            strain    = zeros(SymmetricTensor{2, 3},count)
            stress    = zeros(SymmetricTensor{2, 3},count)
            T, ST = eltype.(defor, strain) 
            return new{T,ST,Int}(defor,strain,stress,count)
        end
end

vinhphunguyen · July 10, 2020, 12:46pm

Thanks Vasily, I used your suggestion, but not a parameterised struct:

function main_static_arrays(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]*Finv'  # convert to Piola Kirchoof stress, no memory alloc
            L             = Finv*Finv 
            D             = 0.5  * (L + L')  # no memory alloc
        end
end

function main_tensor(solid)
        defor = solid.F
        epsi  = solid.ϵ
        stres = solid.σ
        for ip = 1:solid.parcount
            J             = det(defor[ip])
            Finv          = inv(defor[ip])    # no memory alloc
            P             = J*stres[ip]⋅Finv'  # convert to Piola Kirchoof stress, no memory alloc
            L             = Finv⋅Finv 
            D             = symmetric(L)  # no memory alloc
        end
end

function main1()
    parCount = 100000
    s1       = Solid3D_1(parCount)
    @btime main_static_arrays($s1)
end

function main2()
    parCount = 100000
    s2       = Solid3D_2(parCount)
    @btime main_tensor($s2)
end

main1()
main2()

julia> include("Main_Test_Tensor.jl")
  51.429 μs (0 allocations: 0 bytes)
  77.122 μs (0 allocations: 0 bytes)

Seems to me symmetric(A) is slow as I removed it, Tensors.jl was faster slightly.

Topic		Replies	Views
Recent developments in StaticArrays.jl land Package Announcements performance , array , linearalgebra , arrays	12	3719	March 11, 2021
Matrix construction operations - increasing speed New to Julia performance	13	1074	July 5, 2018
Computing Inverse of a stack of matrices General Usage performance , linearalgebra	19	3604	August 29, 2020
Performance of mutable static arrays and compilation cost General Usage	15	1188	July 17, 2018
Creating a Static Array with a loop General Usage	10	1007	April 12, 2022

Linear algebra performance issue

Related topics