Hello all,
I need to compute a 3x3 matrix many many times, it is given by P = J x sigma x inv(F)‘, where J is a scalar, sigma and F are 3x3 matrices; A’ => transpose of A.
What is the best way to compute this P as fast as possible in Julia?
Hello all,
I need to compute a 3x3 matrix many many times, it is given by P = J x sigma x inv(F)‘, where J is a scalar, sigma and F are 3x3 matrices; A’ => transpose of A.
What is the best way to compute this P as fast as possible in Julia?
Since the matrix size is very small, consider using SMatrix{3,3}
from StaticArrays.jl
Thanks, I did use that.
@timeit "2" P = J*sigma*inv(F)' # convert to Piola Kirchoof stress
But the above way of computing P involves many allocation, as measured by TimerOutput.jl.
If you use StaticArrays it should not allocate at all, if it does, there might be some type-stability issue in your code. It’s hard to tell without having an example to run.
No temporary allocations at all? If so, I will double check with @code_warntype. Thanks.
Hmm, it does appear to allocate a small amount, I’m not sure why
julia> a
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
0.873545 1.1805 2.00785
1.26515 2.59622 0.654212
-1.51945 -1.70419 0.517809
julia> b
3×3 SArray{Tuple{3,3},Float64,2,9} with indices SOneTo(3)×SOneTo(3):
1.5845 1.17958 1.72889
-0.974253 -1.15078 -0.647302
1.12172 0.0169034 0.0549179
julia> @time sum(2*a/b)
0.000004 seconds (3 allocations: 176 bytes)
0.2072010681879124
EDIT: sorry, my benchmarking was flawed, it does not allocate anything
julia> @btime sum(2*$a/$b')
26.254 ns (0 allocations: 0 bytes)
Thanks. I confirm your answer. No allocation.
Actually, I got allocations because one of these matrices is MMatrix, not SMatrix. The reason is:
sigma::MMatrix{Float64,3,3}
function modify_stress!(sigma, epsilon::SMatrix{3,3,Float64})
sigma .= (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.)
end
Then, sigma got modified by function modify_stress!().
The thing is I cannot use SMatrix for sigma because it does not allow: sigma .= … By writing sigma = (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.), sigma did not get modified!!!
How could I solve this dilemma?
You do not need to update sigma in place when you use static arrays, simply return the new one
function stress(epsilon::SMatrix{3,3,Float64})
(epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.)
end
I’m confused. Does this work for you? For me it gives an error:
ERROR: MethodError: no method matching length(::UniformScaling{Float64})
What’s the output?
(BTW, are you aware of the tr
function? You can use tr(epsilon)
instead of epsilon[1,1]+epsilon[2,2]+epsilon[3,3]
)
This works for me:
sigma .= tr(epsilon) * I(3)
But you should probably use SMatrix
and just replace sigma
, as @baggepinnen suggests.
Note that you probably want to solve P=J*sigma \ (F')
as it will be likely both faster and more accurate.
To elaborate on this, it is usually better to write things in a functional way when dealing with StaticArrays. So instead of having modify_stress!
where you pass in a return value you simply have compute_stress
and you only pass in the strain. This makes the code less stateful and you pass less arguments to function and it usually get significantly faster.
Since I see you are doing continuum mechanics, you could also take a look at GitHub - Ferrite-FEM/Tensors.jl: Efficient computations with symmetric and non-symmetric tensors with support for automatic differentiation..
Yes, it works for me, my old code is:
function update_stress!(sigma::SMatrix{3,3,Float64},
epsilon::SMatrix{3,3,Float64})
#sigma .= mat.lambda * (epsilon[1,1]+epsilon[2,2]+epsilon[3,3]) * UniformScaling(1.) + 2.0 * mat.mu * epsilon
I did not know of the tr() function. Thanks.
Thanks a lot Kristoffer.
Great, will try that. thanks.
HI Kristogger,
Tensor.jl is pretty cool. I have may be a stupid question, for 3x3 matrices, Tensor.jl is faster than StaticArrays.jl? I did some tests and the speed seems to be the same. But I want to ask your expert opinion.
Should be pretty much the same overall. For 3x3 it might be faster in scalar products with other second order tensors because we use explicit SIMD instructions whereas StaticArrays.jl need help from LLVM (and LLVM is bad for the 3x3 case)
Carlsson, could u please help with this code?
struct Solid3D_1
F :: Vector{SMatrix{3,3,Float64,9}} # F, 2x2 matrix
ϵ :: Vector{SMatrix{3,3,Float64,9}} # F, 2x2 matrix
σ :: Vector{SMatrix{3,3,Float64,9}} # F, 2x2 matrix
parcount::Int64
function Solid3D_1(count)
strain = fill(zeros(3,3),count)
defor = fill(zeros(3,3),count)
stress = fill(zeros(3,3),count)
return new(defor,strain,stress,count)
end
end
struct Solid3D_2
F :: Vector{Tensor{2,3,Float64}} # Tensor{order,dim,T<:Real} F, 2x2 matrix
ϵ :: Vector{SymmetricTensor{2,3,Float64}} # Tensor{order,dim,T<:Real} F, 2x2 matrix
σ :: Vector{SymmetricTensor{2,3,Float64}} # F, 2x2 matrix
parcount::Int64
function Solid3D_2(count)
defor = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}), count)
strain = zeros(SymmetricTensor{2, 3},count)
stress = zeros(SymmetricTensor{2, 3},count)
return new(defor,strain,stress,count)
end
end
function main_static_arrays(solid)
defor = solid.F
epsi = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J = det(defor[ip])
Finv = inv(defor[ip]) # no memory alloc
P = J*stres[ip]*Finv' # convert to Piola Kirchoof stress, no memory alloc
end
end
function main_tensor(solid)
defor = solid.F
epsi = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
#J = det(defor[ip])
Finv = inv(defor[ip]) # no memory alloc
#P = J*stres[ip]⋅Finv' # convert to Piola Kirchoof stress, no memory alloc
end
end
function main()
parCount = 100000
s1 = Solid3D_1(parCount)
s2 = Solid3D_2(parCount)
@btime main_static_arrays(s1)
@btime main_tensor(s2)
end
main()
I got:
julia> include("Main_Test_Tensor.jl")
48.933 μs (0 allocations: 0 bytes)
3.037 ms (100000 allocations: 7.63 MiB)
Why?
From the docs example (https://kristofferc.github.io/Tensors.jl/stable/man/constructing_tensors/#Constructing-tensors-1), it looks like SymmetricTensor{order,dim,T}
is incomplete type specification, so that the elements of Solid3D_2
are vectors with abstract element types. The complete type signature must be F :: Vector{Tensor{2,3,Float64,9}}
and ϵ :: Vector{SymmetricTensor{2,3,Float64,6}}
, I guess.
Try the following definition:
struct Solid3D_2{T<:Tensor{2,3,<:AbstractFloat}, ST<:SymmetricTensor{2,3,<:AbstractFloat}, IT<:Integer}
F :: Vector{T} # Tensor{order,dim,T<:Real} F, 2x2 matrix
ϵ :: Vector{ST} # Tensor{order,dim,T<:Real} F, 2x2 matrix
σ :: Vector{ST} # F, 2x2 matrix
parcount::IT
function Solid3D_2(count)
defor = zeros(Tensor{2,3}, count) #fill(zero(Tensor{2, 3}), count)
strain = zeros(SymmetricTensor{2, 3},count)
stress = zeros(SymmetricTensor{2, 3},count)
T, ST = eltype.(defor, strain)
return new{T,ST,Int}(defor,strain,stress,count)
end
end
Thanks Vasily, I used your suggestion, but not a parameterised struct:
function main_static_arrays(solid)
defor = solid.F
epsi = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J = det(defor[ip])
Finv = inv(defor[ip]) # no memory alloc
P = J*stres[ip]*Finv' # convert to Piola Kirchoof stress, no memory alloc
L = Finv*Finv
D = 0.5 * (L + L') # no memory alloc
end
end
function main_tensor(solid)
defor = solid.F
epsi = solid.ϵ
stres = solid.σ
for ip = 1:solid.parcount
J = det(defor[ip])
Finv = inv(defor[ip]) # no memory alloc
P = J*stres[ip]⋅Finv' # convert to Piola Kirchoof stress, no memory alloc
L = Finv⋅Finv
D = symmetric(L) # no memory alloc
end
end
function main1()
parCount = 100000
s1 = Solid3D_1(parCount)
@btime main_static_arrays($s1)
end
function main2()
parCount = 100000
s2 = Solid3D_2(parCount)
@btime main_tensor($s2)
end
main1()
main2()
julia> include("Main_Test_Tensor.jl")
51.429 μs (0 allocations: 0 bytes)
77.122 μs (0 allocations: 0 bytes)
Seems to me symmetric(A) is slow as I removed it, Tensors.jl was faster slightly.