# LU factorization performance issue

I am comparing times taken for LU decomposition and my julia version seems to do badly compared to other language versions. The matrix is a 10000x10000 random dense matrix generated using `DLATMR`.

Calling `dgetrf` (openblas v0.3.20) directly from fortran takes about 3.7s. Python (scipy) takes about 5.4s. Julia for some reason takes about 15s. I include the python and julia codes. I was expecting a similar timing as the other two versions. `BLAS.get_config()` shows that Julia is also calling an OpenBLAS library. I am only measuring the factorization time.

I am trying to find the reason for the apparent slowness of julia in this case.

Python Version
``````import numpy as np
import struct
import scipy.linalg as la
import time
with open("A.dat", 'rb') as f:

N,= struct.unpack('i', data[:4])
A= np.frombuffer(data[4:4+8*N*N], dtype=np.float64)
A= A.reshape((N,N))
B= np.frombuffer(data[8*N*N+4:], dtype=np.float64)
print(f"{N=}")
# print(A)
# print(B)

t1= time.time()
lu,piv= la.lu_factor(A)
t2=time.time()-t1
print(f"{t2=}")
x= la.lu_solve((lu,piv), B)
print(x[:10])
print("B-AX==0 ? :",np.allclose(A@x,B))
``````
Julia v1.7.3
``````using LinearAlgebra, Statistics;
io=open("A.dat","r")
A=Array{Float64}(undef,N,N)
B=Array{Float64}(undef,N,1)
close(io)
print(N,"\n")

t0=time()
L,U,P=lu(A)
t1=time()

X=U\(L\B[P])

print(t1-t0, '\n')
print(X[1:10], '\n')
print(sqrt(mean((B-A*X).^2)))

``````

The matrix itself is nothing special but its a bit large to share. The config used to generate it is below.

dlatmr config
``````  M=10000
N=10000
SYM='S'
DIST='U'
ISEED = [0,47,2000,101]
MODE = 5
COND = 1000.0
DMAX = 100D0
RSIGN = 'T'
MODEL = 1
MODER = 1
PIVTNG = 'N'
SPARSE = 0.0
KL = M
KU = M
ANORM = 1.0d0
PACK = 'N'
LDA = M
``````

can’t reproduce

``````In : import numpy as np

In : A = np.random.rand(10000, 10000);

In : import scipy.linalg as la

In : %time la.lu_factor(A)
CPU times: user 39.7 s, sys: 3.03 s, total: 42.8 s
Wall time: 3.15 s

In : %time la.lu_factor(A);
CPU times: user 39.1 s, sys: 3.26 s, total: 42.4 s
Wall time: 3.04 s
``````

## Julia

``````julia> using LinearAlgebra

julia> const B = rand(10000, 10000);

julia> @time lu(B);
1.990439 seconds (4 allocations: 763.016 MiB)

julia> @time lu(B);
1.860999 seconds (4 allocations: 763.016 MiB, 0.38% gc time)
``````
2 Likes

maybe give these two `const` and also don’t include Julia JIT compile time when benchmarking

Interesting. Something is wrong indeed with my installation of Julia then.

``````julia> using LinearAlgebra

julia> const B = rand(10000,10000);

julia> @time lu(B);
14.362888 seconds (4 allocations: 763.016 MiB, 0.03% gc time)

julia> @time lu(B);
14.545017 seconds (4 allocations: 763.016 MiB, 0.14% gc time)
``````

What cpu do you have? Also, how did you install Julia?

OS is Windows 10

Maybe you are in a single thread session ?

11.503429 seconds (4 allocations: 763.016 MiB)

1.879096 seconds (4 allocations: 763.016 MiB)

1 Like

I dont think that is the issue. The previous ones were single threaded. This one with 8 threads still shows same result

``````julia> Threads.nthreads()
8

julia> const B = rand(10000,10000);

julia> using LinearAlgebra

julia> @time lu(B);
14.207715 seconds (4 allocations: 763.016 MiB, 0.02% gc time)

julia>
``````

how much RAM do you have?

32GB RAM

also thanks for extracting the core problem from my mess of a post haha.

what if you first

``````BLAS.set_num_threads(4)
``````

?

Doesnt help unfortunately. I tried setting it to 4 and 8. `BLAS.get_num_threads` was 8 by default actually.

I tried to replace Julia’s libopenblas with another one. The replacement was not ilp64. Im going to build an ilp64 version and see if that clarifies things.

The problem is coming from the openblas bundled with the installer (v1.7.3 in this case). Replacing the `libopenblas64_.dll` with a built from source version fixed this problem.

``````julia> using LinearAlgebra;

julia> const B = rand(10000,10000);

julia> @time lu(B);
3.506253 seconds (4 allocations: 763.016 MiB, 0.09% gc time)

julia> @time lu(B);
3.607796 seconds (4 allocations: 763.016 MiB, 0.07% gc time)

julia>
``````

1 Like

I did install it from the website.
In trying to find the issue, inside Julia’s installation folder I replaced the openblas library with one that i built from source.

Also I went to file a bug report but this was already there

2 Likes

yeah you should have said that as the first thing sir

I think you misunderstood. The one shipped with the installer is borked. The replacement works as expected.

I see, hopefully this is fixed in the upcoming 1.8 as discussed in the github issue.

I just tested with 1.8 rc1 and it works properly out of the box.

2 Likes