Is the linear system solver \ also multi threaded in Julia as in Matlab? And how to “multithread” it in Julia?

nico · September 28, 2020, 1:24pm

I am trying to compare speed and performance between Matlab and Julia. I am looking at a code that does topology optimization of a continuum structure subjected to a given load. The code I am looking at is the public code topopt88.m: Efficient topology optimization in MATLAB using 88 lines of code - TopOpt

Essentially it is an iterative algorithm where in every iteration a Ax=b system is solved (x=A\b), where the sparse matrix A depends on the structural design (it is the finite element stiffness matrix) and it is updated in every iteration.

In Julia the same code runs slower than Matlab. I have done some code optimization in Julia, declaring types in function definitions, using functions as much as possible, avoiding global variables, and implementing other tips I found in the internet. But Julia is still slower than the same Matlab code (same in the sense of conceptual steps).

My question: since Matlab system solve "" is multi threaded by default, is it true the same for Julia? If not, how to multi thread Julia’s \ operator, or to get speed-ups from parallelization similarly?

P.S.
The following is the current best solution that I found and that gives the best performance to me. Outside Julia in a terminal in Windows type set JULIA_NUM_THREADS=4 . Start Julia and use the Linear Algebra module this way: using LinearAlgebra BLAS.set_num_threads(1)

In this way there will be 4 threads, and by setting the BLAS threads to 1 it basically will not use its own pool that seems to conflict with Julia threads pool, or at least to slow the performance for me. I got some hint in this regard here: Julia Threads vs BLAS threads - #15 by alkorang My current BLAS setting in Julia 1.5.1 is julia> BLAS.vendor() :openblas64

Using Pardiso.jl with the default MKL sparse solver didn’t help. I still need to figure out if there is a way to increase performance there too and how.

Oscar_Smith · September 28, 2020, 1:28pm

Any chance you can post the code you’re using in Julia? Functions like \ have dozens of different possible methods that will be called depending on the type of arguments they receive. This it is much more predictive to talk about the performance of a specific piece of code than an algorithm in general.

nico · September 28, 2020, 1:36pm

The full code is a bit too long I feel, but essentially the heart of the code is a while loop until convergence of the design update. In every while loop iteration the following system is solved:

#FE-ANALYSIS
sK = reshape(KE[:]*(Emin.+xPhys[:]'.^penal*(E0-Emin)),64*nelx*nely)
K = sparse(iK,jK,sK)
K = (K+K')/2
U[freedofs] = K[freedofs,freedofs]\F

where freedofs are indices of rows and columns to be selected for the analysis. K is sparse, symmetric, and positive definite hence invertible; F an U are normal arrays.

Does this help?

mzilhao · September 28, 2020, 2:24pm

You should probably try benchmarking your code, but it could be that the way you are using the sparse matrix is non-optimal… See here: Sparse Arrays · The Julia Language
Note specially the following:

Indexing operations, especially assignment, are expensive, when carried out one element at a time. In many cases it may be better to convert the sparse matrix into (I,J,V) format using findnz , manipulate the values or the structure in the dense vectors (I,J,V) , and then reconstruct the sparse matrix.

kristoffer.carlsson · September 28, 2020, 2:41pm

nico:

U[freedofs] = K[freedofs,freedofs]\F 
where freedofs are indices of rows and columns to be selected for the analysis. K is sparse, symmetric, and positive definite hence invertible; F an U are normal arrays.

Try

Symmetric(K[freedofs, freedofs]) \ F

even though the matrix is analytically symmetric, it may not be so numerically. You can check with issymetric(K).

nico · September 28, 2020, 2:54pm

Thank you @kristoffer.carlsson. The symmetry is ensured numerically (and not only analytically) by doing:

K = (K+K')/2

Do you think that Julia would prefer the Symmetric command? Doesn’t this slow down things more? I will try anyway, thanks.

Oscar_Smith · September 28, 2020, 2:59pm

Even if a matrix is numerically symmetric, telling Julia that often gives a 2x speedup.

nico · September 28, 2020, 3:12pm

I removed
K = (K+K')/2 ,
and replaced
U[freedofs] = K[freedofs,freedofs]\F
with
U[freedofs] = Symmetric(K[freedofs,freedofs])\F
and gained in performance. Thanks!
I still think that there might be more to it. I am setting the number of threads outside Julia as 4
set JULIA_NUM_THREADS=4,
and in the script I set
using LinearAlgebra
BLAS.set_num_threads(1)
and in this way I get the best performance using openblas.
I feel that mutithreading and sparse matrix definition could be pushed more to gain more in computational speed.

Oscar_Smith · September 28, 2020, 3:22pm

Does your matrix have banded it blocked-banded structure? If so, using that could easily give you a factor of 10 speedup. Also, those structures are much easier to parallelize than CSC

nico · September 28, 2020, 3:39pm

The benchmark:

BenchmarkTools.Trial:
  memory estimate:  16.82 GiB
  allocs estimate:  29092
  --------------
  minimum time:     18.681 s (5.05% GC)
  median time:      18.681 s (5.05% GC)
  mean time:        18.681 s (5.05% GC)
  maximum time:     18.681 s (5.05% GC)
  --------------
  samples:          1
  evals/sample:     1

mzilhao · September 28, 2020, 3:48pm

I was thinking more of benchmarking each individual call in your sample code, like the cost of

K = sparse(iK,jK,sK)

K = (K+K')/2

and

K[freedofs,freedofs]

and then compare that with the actual time for solving the system.

kristoffer.carlsson · September 28, 2020, 3:49pm

For large symmetric positive definite matrix you are also likely better of to use an interative solver like conjugate gradient. Would be a good idea to try that out:

https://juliamath.github.io/IterativeSolvers.jl/dev/linear_systems/cg/#IterativeSolvers.cg

PetrKryslUCSD · September 28, 2020, 3:50pm

How big is the matrix?

nico · September 28, 2020, 3:53pm

I have a mesh of 300x100 elements in 2D, with 2 degrees of freedom per node, which gives around 60800 dofs.

PetrKryslUCSD · September 28, 2020, 3:54pm

Then the matrix must be extremely dense, for some reason. I solve 2D heat conduction models with 1 million dofs in ~5seconds. Try to find the number of nonzeros, total and per row.

nico · September 28, 2020, 3:55pm

Is there a better way to impose boundary conditions in Julia than writing K[freedofs,freedofs]? Maybe this eats some time…

PetrKryslUCSD · September 28, 2020, 3:56pm

I do precisely this in Elfel.jl. No problem, it only takes about 0.1 sec.

nico · September 28, 2020, 3:56pm

OK thank you very much

nico · September 28, 2020, 3:56pm

I will try

PetrKryslUCSD · September 28, 2020, 3:58pm

Hey, just in case: check the type of the matrix when you get to solve. To see if it did not get converted to a dense matrix at some point…

Topic		Replies	Views
Multithreading with sparse arrays Numerics	3	1690	May 6, 2021
Performance issue with multithreaded computation with matrix operations at its heart (Threads.@threads vs. BLAS threads) Performance blas , parallel , multithreading , linearalgebra , threads	7	381	November 13, 2023
Multi-threading of julia-1.8.5 does not improve speed when combined with BLAS New to Julia	17	1393	May 1, 2023
Linear solver \(A, B) performance vs Matlab A\b General Usage	32	7571	May 21, 2017
Matlab's matmul much faster than julia's New to Julia	6	667	April 5, 2024

Is the linear system solver \ also multi threaded in Julia as in Matlab? And how to “multithread” it in Julia?

Related topics