Acceleration of a function

Hello everyone,

I am currently transforming some Matlab code into Julia. Everything works fine, but I recognized a big difference in runtime. I don’t have much experience in that regard, so I started reading the guidelines of Julia. Further, I know that Matlab is highly parallelized, which I plan for my Julia code to some extent, too. But first I want my cde to run as fast as possible without parallelization.
In my code, the following function is used repeatedly and it takes rather long time to calculate (around 36 ms, evaluated with @btime)

gsm_cascade(Sc::Array{Array{Complex{Float64},2},2}, S0::Array{Array{Complex{Float64},2},2}, S1::Array{Array{Complex{Float64},2},2}, D::Diagonal{Complex{Float64},Array{Complex{Float64},1}})

     S111 = S1[1,1]
     S121 = S1[1,2]
     S211 = S1[2,1]
     S221 = S1[2,2]

     S110 = S0[1,1]
     S120 = S0[1,2]
     S210 = S0[2,1]
     S220 = S0[2,2]

     I = sparse(1.0LinearAlgebra.I, size(D,1), size(D,1))
     U1 = inv(I - S221*D*S110*D))
     U2 = inv(I - S110*D*S221*D))

     Sc[1,1] = S111 + S121*D*U2*S110*D*S211
     Sc[1,2] = S121*D*U2*S120
     Sc[2,1] = S210*D*U1*S211
     Sc[2,2] = S220 + S210*D*U1*S221*D*S120

     return Sc

end

For explanation, I have two Matrices S0 and S1, which are used to get matrix Sc. Since the matrices are constructed by four submatrices, I decided to use an Array of Arrays to represent each of S0, S1 and Sc, in order to keep track of the submatrices.
At first, I thought that taking the inverses is the most time consuming part, but actually the matrix multiplications and additions of

     Sc[1,1] = S111 + S121*D*U2*S110*D*S211
     Sc[1,2] = S121*D*U2*S120
     Sc[2,1] = S210*D*U1*S211
     Sc[2,2] = S220 + S210*D*U1*S221*D*S120

are the slow part of the function. Does anybody know how I can accelerate this function? The biggest matrix is S111, which is currently 32 by 32. Or would be using parallelized code the only option here?

Further, I recognized that

S111 = S1[1,1]

does not allocate anything on the heap . Why is that the case? Is it because the matrix S1 is aleady allocated and fed into the function?

Thank you in advance and have a nice day :smiley:
Jonas

This is just a pointer assignment: S1 is a matrix of matrices, stored in practice as four pointers.

  • that line I = sparse(1.0LinearAlgebra.I, size(D,1), size(D,1)) is unnecessary I think; just using I is fine
  • you probably shouldn’t use inv but rather factor the matrix; probably not going to do much for speed if that’s not the bottleneck though.
  • Lines like S111 + S121*D*U2*S110*D*S211 allocate a lot of temporary arrays. That might be the problem here. You can use in-place operations (mul!, dot assignment and fusion)
  • If your arrays are small, look up StaticArrays.jl.
  • If you want to compare against matlab be careful that matlab uses MKL, while julia uses OpenBLAS by default. You can make julia use MKL with the MKL.jl package.
  • You don’t need to type your input arguments
  • In all cases, profile to see what takes time and what impact your modifications have
1 Like

If that is indeed the case the manual version (5 mul! and one add) is pretty annoying to write down. It would be cool if there was a package that could do figure it out by itself in the general case where matrices might not be of the same size.

Hello Mr. Levitt,

thank you very much for your answer. I will look into everything you said. Since my code is dominated by matrix multiplications, maybe MKL.jl is the better choice for me.

What do you mean by " You don’t need to type your input arguments"? :smiley:

Best regards,
Jonas

This

gsm_cascade(Sc::Array{Array{Complex{Float64},2},2}, S0::Array{Array{Complex{Float64},2},2}, S1::Array{Array{Complex{Float64},2},2}, D::Diagonal{Complex{Float64},Array{Complex{Float64},1}})

can just be replaced by gsm_cascade(Sc, S0, S1, D). In Julia type annotations are used for dispatch and/or clarity, there’s no performance impact.

2 Likes

If I call the functions I don’t use the type annotations. But if I define a function for example in a Module, aren’t the type annotations somehow important?

By the way, I installed MKL.jl but unfortunalely the package Arpack doen’t work anymore for solving needed generalized eigenvalue problems. So I have to find a solution for this now haha.

Nope. They can be useful, but you mostly need them if you want to make several versions of the same function (also called ‘methods’).

3 Likes

Ok thank you for your reply. But if the annotations aren’t destructive in any way, I think I will leave them for clarity :smiley:

Well, it depends what you mean by destructive. Those annotations mean that your code will fail for real-valued matrices, or if they are Diagonal, Hermitian, SharedArrays, StaticArrays, etc etc. Every array that is not strictly of type Array, and with precisely 64 bit floating point complex elements, which is very restrictive.

It also makes the function signature very verbose, which does not necessarily help clarity.

6 Likes

Really? There were issues like that that I thought were fixed recently by switching to 32bits MKL. Also worth noting that there are several good pure-julia alternatives to Arpack, eg IterativeSolvers.jl and KrylovKit.jl

Well, it depends what you mean by destructive. Those annotations mean that your code will fail for real-valued matrices, or if they are Diagonal, Hermitian, SharedArrrays, StaticArrays, etc etc. Every array that is not strictly of type Array, and with precisely 64 bit floating point complex elements, which is very restrictive.

To expand on that, the standard way of doing things in julia is 1) do not type input arguments unless you use it for dispatch or it helps understand what the function does 2) if you have to type, type it in the least possible restrictive way. This makes code less verbose, and allows things like GPU, autodiff, multiple floating point types, etc. to work without modifications to the code.

3 Likes

Yes, I get this error:
ERROR: LoadError: ARPACKException: unspecified ARPACK error: -10
Stacktrace:
[1] aupd_wrapper(::Type{T} where T, ::Arpack.var"#matvecA!#24"{SparseMatrixCSC{Float64,Int64}}, ::Arpack.var"#21#28"{SparseMatrixCSC{Float64,Int64}}, ::Arpack.var"#23#30", ::Int64, ::Bool, ::Bool, ::String,
::Int64, ::Int32, ::String, ::Float64, ::Int64, ::Int64, ::Array{Float64,1}) at path.julia\packages\Arpack\o35I5\src\libarpack.jl:76
[2] _eigs(::SparseMatrixCSC{Float64,Int64}, ::SparseMatrixCSC{Float64,Int64}; nev::Int64, ncv::Int64, which::Symbol, tol::Float64, maxiter::Int64, sigma::Nothing, v0::Array{Float64,1}, ritzvec::Bool) at path.julia\packages\Arpack\o35I5\src\Arpack.jl:181
[3] #eigs#11 at path.julia\packages\Arpack\o35I5\src\Arpack.jl:48 [inlined]
[4] top-level scope at …path…\Filter Tool\myfile.jl:19
[5] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1088
in expression starting at …path…\myfile.jl:19

Before switching to MKL.jl, using the “eigs” function worked like a charm. I tried KrylovKit.jl and it works, even with Sparse Arrays. I am not sure if it is as fast as eigs() before though. Thank you for the suggestion.

Possibly you need to rebuild ARPACK? You can try opening an issue at ARPACK.jl.

I am not sure if it is as fast as eigs() before though

They both implement variants of the same broad class of algorithms, so I would imagine their performance is similar. Comparisons would be interesting though. @juthohaegeman may have some thoughts on this.

I often find that you should supply the keyword argument sigma for eigs to work well. Of course, it depends on the matrix but…

Ok thank you for your reply, I will try it!
best regards.

I rebuilt Arpack with no success. But thank you very much for your help. I learned a lot :smiley:

You do you! Personally I like to leave them in too - when I’m looking back at the code in 3 months time the type annotations often get me back in the right headspace quickly.

2 Likes

For anyone interested, I got it to work now. Instead of Arpack I used the IterativeSolvers package. And everything works with MKL now. I got a massiv boost in execution time. My function before took about 800ms and now with MKS it only takes about 2.5ms. Back to work, thank you to everyone for helping me!

Best regards,
Jonas

4 Likes