Performance Issues - Rayleigh Compressible

I have this function called Rayleigh Compressible that is used to perform analysis of fluid flow stability,

using LinearAlgebra

function rayleighc(Ω, U, M, dU, D, D2)
    Λ = zeros(ComplexF64, 3*size(D, 1), length(Ω));
    V = zeros(ComplexF64, 3*size(D, 1), 3*size(D, 2), length(Ω));

    L = zeros(Float64, 3*size(D, 1), 3*size(D, 2))
    F = zeros(Float64, 3*size(D, 1), 3*size(D, 2))

    II = Matrix(1.0I(size(D, 1)));
    ZZ = zeros(Float64, size(D));

    N = length(Ω)

    for i = 1:N
        ω = Ω[i]

        F0 = -ω .* D2 - ω.^2 .* M.^2 .* II;
        F1 = diagm(U)*D2 - 2 .* diagm(dU)*D + 3 .* diagm(U) .* ω.^2 .* M.^2;
        F2 = ω .* II - 3 .* diagm(U) .* ω.^2 .* M.^2;
        F3 = -diagm(U) .+ diagm(U.^3) .* M.^2;

        L = [ZZ  II  ZZ;
             ZZ  ZZ  II;
            -F0 -F1 -F2];
        F = [II  ZZ  ZZ;
             ZZ  II  ZZ;
             ZZ  ZZ  F3];

        λ, v = eigen(L, F);

        o = sortperm(-imag(λ))
        λ = λ[o]
        v = v[:,o]

        Λ[:,i] = λ
        V[:,:,i] = v
    end

    return (Λ, V)
end

I run this function using BenchmarkTools, and I get this result,

BenchmarkTools.Trial: 
  memory estimate:  380.95 MiB
  allocs estimate:  825
  --------------
  minimum time:     28.811 s (0.69% GC)
  median time:      28.811 s (0.69% GC)
  mean time:        28.811 s (0.69% GC)
  maximum time:     28.811 s (0.69% GC)
  --------------
  samples:          1
  evals/sample:     1

as a reference, I’m using similar code in Matlab, and I got these times

Elapsed time is 7.599606 seconds.
Elapsed time is 8.759860 seconds.
Elapsed time is 9.054606 seconds.
Elapsed time is 8.965378 seconds.

Could someone help me how to improve the Julia code to be faster than Matlab version?

Below is the main code

using LinearAlgebra
using LaTeXStrings
using BenchmarkTools

include("cheb.jl")
include("rayleighc.jl")

validation = true;

N = 201;
D, x = cheb(N);
D2 = D * D;

l = 1.0
y = l .* x ./ sqrt.(1.0 .- x.^2);
Q = (1.0 .- x.^2); 

Dmod = diagm(sqrt.(Q) .* Q) * D ./ l;
D2mod = (diagm(Q.^3)*D2 - 3 .* diagm(Q.^2 .* x)*D) ./ l.^2;

x = y;
D = Dmod;
D2 = D2mod;

if validation
    Q = 0.8;
    U = 1.0 .- Q .* sech.(x).^2;
end

dU = D*U;
ddU = D2*U;

# %% Dirichlet BC
D = D[2:N, 2:N];
D2 = D2[2:N, 2:N];
x = x[2:N];
U = U[2:N];
dU = dU[2:N];
ddU = ddU[2:N];

# %% Solution
M = 0.1;
Ω = 0.1:0.1:1.0;

Λ, V = rayleighc(Ω, U, M, dU, D, D2);

The cheb.jl was improved in another Topic and is as follows.

function cheb(N)
    N == 0 && return (D, x)
    x = cospi.(0 : 1/N : 1)
    c = ones(Float64, N+1);
    c[begin] = c[end] = 2.0
    c[2:2:N+1] .*= -1.0
    D = (c*(1 ./ c)') ./ (x .- x' .+ I(N+1))
    D[diagind(D)] -= sum(D,dims=2)
    return (D, x)
end

What does profiling say? If the main cost is the eigen call its likely that it might just be an openblas vs MKL difference.

1 Like

Hello @kristoffer.carlsson, here is the profile

 Count  Overhead File                                                                                                      Line Function
 =====  ======== ====                                                                                                      ==== ========
     4         0 @Base/abstractarray.jl                                                                                    1153 setindex!
     4         0 @Base/abstractarray.jl                                                                                     632 similar
     4         0 @Base/abstractarray.jl                                                                                    1740 typed_hvcat(::Type{Float64}, ::Tuple{Int64,Int64,Int64}, ::Array{Float64,2}, ::Vararg{Array{Float64,2},N} where N)
     3         0 @Base/abstractarray.jl                                                                                    1756 typed_hvcat(::Type{Float64}, ::Tuple{Int64,Int64,Int64}, ::Array{Float64,2}, ::Vararg{Array{Float64,2},N} where N)
     1         1 @Base/array.jl                                                                                             371 copy
    10         0 @Base/array.jl                                                                                             357 fill!
     3         0 @Base/array.jl                                                                                             785 iterate
    10        10 @Base/array.jl                                                                                             847 setindex!
     2         2 @Base/array.jl                                                                                             849 setindex!
     4         0 @Base/array.jl                                                                                             380 similar
    10         0 @Base/array.jl                                                                                             521 zeros
    10         0 @Base/array.jl                                                                                             526 zeros
     1         0 @Base/arraymath.jl                                                                                          30 -(::Array{Float64,2})
     4         4 @Base/boot.jl                                                                                              408 Array
     4         0 @Base/boot.jl                                                                                              416 Array
  2210         0 @Base/boot.jl                                                                                              331 eval(::Module, ::Any)
     1         0 @Base/broadcast.jl                                                                                         621 _broadcast_getindex
     1         0 @Base/broadcast.jl                                                                                         648 _broadcast_getindex_evalf
     1         0 @Base/broadcast.jl                                                                                         826 broadcast_preserving_zero_d
     1         0 @Base/broadcast.jl                                                                                         862 copy
     1         0 @Base/broadcast.jl                                                                                         886 copyto!
     1         0 @Base/broadcast.jl                                                                                         931 copyto!
     1         0 @Base/broadcast.jl                                                                                         575 getindex
     1         0 @Base/broadcast.jl                                                                                         932 macro expansion
     1         0 @Base/broadcast.jl                                                                                         837 materialize
     4         0 @Base/cartesian.jl                                                                                          64 macro expansion
  2210         0 @Base/client.jl                                                                                            399 (::Base.var"#806#808"{Bool,Bool,Bool,Bool})(::Module)
  2210         0 @Base/client.jl                                                                                            506 _start()
  2210         0 @Base/client.jl                                                                                            313 exec_options(::Base.JLOptions)
  2210         0 @Base/client.jl                                                                                            383 run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool)
     1         0 @Base/combinatorics.jl                                                                                     116 permutecols!!(::Array{Complex{Float64},2}, ::Array{Int64,1})
     1         0 @Base/combinatorics.jl                                                                                     103 swapcols!(::Array{Complex{Float64},2}, ::Int64, ::Int64)
  2210         0 @Base/essentials.jl                                                                                        710 #invokelatest#1
  2210         0 @Base/essentials.jl                                                                                        709 invokelatest
     1         1 @Base/float.jl                                                                                             393 -
     1         1 @Base/int.jl                                                                                                86 +
     2         2 @Base/int.jl                                                                                                85 -
     2         0 @Base/int.jl                                                                                               922 -
  2210         0 @Base/loading.jl                                                                                          1088 include_string(::Function, ::Module, ::String, ::String)
  2210         0 @Base/loading.jl                                                                                          1096 include_string
     4         0 @Base/multidimensional.jl                                                                                  785 _setindex!
     4         0 @Base/multidimensional.jl                                                                                  789 _unsafe_setindex!(::IndexLinear, ::Array{Float64,2}, ::Array{Float64,2}, ::UnitRange{Int64}, ::UnitRange{Int64})
     4         0 @Base/multidimensional.jl                                                                                  797 macro expansion
     1         0 @Base/multidimensional.jl                                                                                  802 macro expansion
     3         0 @Base/multidimensional.jl                                                                                  803 macro expansion
     1         1 @Base/promotion.jl                                                                                         398 ==
     1         0 @Base/range.jl                                                                                             624 iterate
     1         0 @Base/simdloop.jl                                                                                           77 macro expansion
     1         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/LinearAlgebra.jl   349 copy_oftype
     3         3 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/blas.jl           1374 gemm!(::Char, ::Char, ::Float64, ::Array{Float64,2}, ::Array{Float64,2}, ::Float64, ::Array{Float64,2})
  2185         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           430 eigen!(::Array{Float64,2}, ::Array{Float64,2}; sortby::typeof(LinearAlgebra.eigsortby))
     1         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           441 eigen!(::Array{Float64,2}, ::Array{Float64,2}; sortby::typeof(LinearAlgebra.eigsortby))
     1         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           447 eigen!(::Array{Float64,2}, ::Array{Float64,2}; sortby::typeof(LinearAlgebra.eigsortby))
  2188         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           501 #eigen#79
  2188         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           500 eigen
  2187         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           428 eigen!
     1         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/eigen.jl           136 sorteig!
  2185      2185 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/lapack.jl         2135 ggev!(::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,2})
     3         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl          160 *
     3         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl          597 gemm_wrapper!(::Array{Float64,2}, ::Char, ::Char, ::Array{Float64,2}, ::Array{Float64,2}, ::LinearAlgebra.MulAddMul{true,true,Bool,Bool})
     3         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl          169 mul!
     3         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/matmul.jl          208 mul!
  2210         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl                     292 run_repl(::REPL.AbstractREPL, ::Any; backend_on_current_task::Bool)
  2210         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl                     134 eval_user_input(::Any, ::REPL.REPLBackend)
  2210         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl                     195 repl_backend_loop(::REPL.REPLBackend)
  2210         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl                     288 run_repl(::REPL.AbstractREPL, ::Any)
  2210         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl                     180 start_repl_backend(::REPL.REPLBackend, ::Any)
     7         0 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/SparseArrays/src/sparsevector.jl    1079 hvcat
  2210         0 @VSCodeServer/src/repl.jl                                                                                   82 evalrepl(::Module, ::String, ::REPL.LineEditREPL, ::REPL.LineEdit.Prompt)
  2210         0 @VSCodeServer/src/repl.jl                                                                                   97 repleval(::Module, ::String, ::String)
     9         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                               5 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
     1         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                               8 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
     3         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                              19 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
     2         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                              23 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
     6         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                              26 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
  2188         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                              30 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
     1         0 /home/sano/Dropbox/code/julia/0009_rayleighc_sech/rayleighc.jl                                              37 rayleighc(::Float64, ::Array{Float64,1}, ::Float64, ::Array{Float64,1}, ::Array{Float64,2}, ::Array{Float64,2})
Total snapshots: 2719

Could you explain what these results mean?

I’ve looked for some solution, and I found the repository of JuliaComputing/MKL.jl and this topic " Unusually bad performance of eigen() compared to eig() in Matlab". After following the tutorial the code is running 3x faster than the first version, here is the benchmark

BenchmarkTools.Trial: 
  memory estimate:  379.85 MiB
  allocs estimate:  825
  --------------
  minimum time:     9.068 s (1.35% GC)
  median time:      9.068 s (1.35% GC)
  mean time:        9.068 s (1.35% GC)
  maximum time:     9.068 s (1.35% GC)
  --------------
  samples:          1
  evals/sample:     1

Many thanks to @kristoffer.carlsson by the repository.

1 Like