Why lu() factorization is not reducing the execution time?

I am using lu() to reduce the computation time but it turned out not as expected. Any idea why is that?

julia> A = [1 3 3 2;2 6 9 7; -1 -3 3 4]
3×4 Matrix{Int64}:
  1   3  3  2
  2   6  9  7
 -1  -3  3  4

julia> lu(A,check=false)
Failed factorization of type LU{Float64, Matrix{Float64}}

julia> b = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> @btime A\b
  3.388 μs (37 allocations: 70.50 KiB)
4-element Vector{Float64}:
 -0.09523809523809525
 -0.28571428571428575
  0.21904761904761905
  0.3142857142857143

julia> @btime U\(L\b[p])
  4.029 μs (39 allocations: 71.08 KiB)
4-element Vector{Float64}:
 -3.333333333333332
  1.999999999999999
 -1.6666666666666656
  1.6666666666666674

first of all you’re benchmarking in global scope with non-const variables, second, the results are not the same?..

This is in local scope. For second, yes you are right about the results but this is due to the pseudo inverse algorithm, I guess.

julia> @btime $A\$b
  3.800 μs (37 allocations: 70.50 KiB)
4-element Vector{Float64}:
 -0.09523809523809525
 -0.28571428571428575
  0.21904761904761905
  0.3142857142857143

julia> @btime $U\($L\$b[p])
  4.486 μs (39 allocations: 71.08 KiB)
4-element Vector{Float64}:
 -3.333333333333332
  1.999999999999999
 -1.6666666666666656
  1.6666666666666674

what is U,L,p

KLU factorization of a matrix, i.e., results of lu() function.
By the way, based on the results shown above, it seems that there are no difference if the benchmarking in global scope or local scope with non-const variables, right? In other words, what we have to put $ in front of non-const variables after @btime?

First, the right way to use an LU factorization is to call \ with the factorization object, not to individually apply the L and U factors yourself.:

julia> A = randn(1000,1000); b = randn(1000);

julia> LU = lu(A);

julia> @btime $A \ $b;
  9.904 ms (4 allocations: 7.64 MiB)

julia> @btime $LU \ $b;
  241.441 μs (1 allocation: 7.94 KiB)

Second, your matrix is too small:

For a 3x3 matrix, the overhead of LAPACK etcetera is enormous. StaticArrays will be much faster for such small problems if the size is fixed:

julia> A = randn(3,3); b = randn(3);

julia> LU = lu(A);

julia> sA = SMatrix{3,3}(A); sb = SVector{3}(b);

julia> @btime $A \ $b;
  417.249 ns (3 allocations: 384 bytes)

julia> @btime $LU \ $b;
  167.488 ns (1 allocation: 112 bytes)

julia> @btime $sA \ $sb;
  6.537 ns (0 allocations: 0 bytes)
14 Likes

Thank you very much @stevengj . So, how to call the factorization object of the below matrix to use it while calling \?

julia> A = [1 3 3 2;2 6 9 7; -1 -3 3 4];

julia> LU = lu(A,check=false);

julia> b = [1,2,3];

julia> @btime $A \ $b;
  2.812 μs (37 allocations: 70.50 KiB)

julia> @btime $LU \ $b;
ERROR: DimensionMismatch("matrix is not square: dimensions are (3, 4)")
Stacktrace:
  [1] checksquare
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\LinearAlgebra.jl:224 [inlined]     
  [2] getrs!(trans::Char, A::Matrix{Float64}, ipiv::Vector{Int64}, B::Vector{Float64})
    @ LinearAlgebra.LAPACK C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\lapack.jl:1014
  [3] ldiv!
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\lu.jl:391 [inlined]
  [4] \(F::LinearAlgebra.LU{Float64, Matrix{Float64}}, B::Vector{Int64})
    @ LinearAlgebra C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\factorization.jl:104
  [5] var"##core#275"(LU#273::LinearAlgebra.LU{Float64, Matrix{Float64}}, b#274::Vector{Int64})
    @ Main C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:479
  [6] var"##sample#276"(__params::BenchmarkTools.Parameters)
    @ Main C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:485
  [7] _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Symbol, Integer, NTuple{4, Symbol}, NamedTuple{(:samples, :evals, :gctrial, :gcsample), Tuple{Int64, Int64, Bool, Bool}}})
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:98
  [8] #invokelatest#2
    @ .\essentials.jl:710 [inlined]
  [9] #run_result#45
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:33 [inlined]
 [10] run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Iterators.Pairs{Symbol, Integer, NTuple{5, Symbol}, NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample), Tuple{Bool, Int64, Int64, Bool, Bool}}})
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:116
 [11] #warmup#54
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:168 [inlined]
 [12] warmup(item::BenchmarkTools.Benchmark)
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:168
 [13] top-level scope
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:565

Another issue, why in the benchmarking, it has to be in local scope no global scope with non-const variables?

@stevengj
I tried to continue on this MWE to do the following julia> @btime $sLU \ $sb; as below,

sLU = SMatrix{size(LU,1),size(LU,2)}(LU);
ERROR: ArgumentError: Size mismatch in Static Array parameters. Got size Tuple{3, 3}, dimension 2 and length 1.
Stacktrace:
 [1] macro expansion
   @ C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\util.jl:0 [inlined]
 [2] check_array_parameters(#unused#::Type{Tuple{3, 3}}, #unused#::Type{LinearAlgebra.LU{Float64, Matrix{Float64}}}, #unused#::Type{Val{2}}, #unused#::Type{Val{1}})
   @ StaticArrays C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\util.jl:57
 [3] SMatrix{3, 3, LinearAlgebra.LU{Float64, Matrix{Float64}}, 1}(x::Tuple{LinearAlgebra.LU{Float64, Matrix{Float64}}})
   @ StaticArrays C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\SArray.jl:22
 [4] macro expansion
   @ C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\SMatrix.jl:36 [inlined]
 [5] SArray
   @ C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\SMatrix.jl:32 [inlined]
 [6] (SMatrix{3, 3, T, L} where {T, L})(x::LinearAlgebra.LU{Float64, Matrix{Float64}})
   @ StaticArrays C:\Users\amroa\.julia\packages\StaticArrays\vxjOO\src\convert.jl:4
 [7] top-level scope
   @ REPL[28]:1

# Below is what I wanted to do:
#julia> @btime $sLU \ $sb;

Your matrix is rank-deficient, so the decomposition fails - unless you’re tied to a particular decomposition, you can instead use factorize(A), which dispatches to a polyalgorithm according to the properties of A (see this chart, or the docs). For example,

julia> f = factorize(A)
QRPivoted{Float64, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRPackedQ{Float64, Matrix{Float64}}:
 -0.301511  -0.275241  -0.912871
 -0.904534  -0.220193   0.365148
 -0.301511   0.935819  -0.182574
R factor:
3×4 Matrix{Float64}:
 -9.94987  -5.4272   -8.14081      -1.80907
  0.0      -4.95434   1.65145      -1.65145
  0.0       0.0       1.50772e-15   2.36998e-16
permutation:
4-element Vector{Int64}:
 3
 2
 4
 1

julia> f\b
4-element Vector{Float64}:
 -0.09523809523809522
 -0.2857142857142857
  0.2190476190476191
  0.3142857142857143

Julia’s speed requires that the compiler can figure out the machine representation (the type) of all the variables participating in a calculation. With non-constant global variables, you can redefine their types in the middle of a calculation, or between function calls, which makes it impossible for the compiler to optimize functions that depend on such variables. There’s little point to benchmarking a function if it’s hobbled by global variables - we know it’ll be slow.

2 Likes

@stillyslalom thank you for your explanation. So, putting $ in front of the variables in `@btime@ fix their type during the benchmarking, right?

Can u tell me what should I do in my MWE to over come the error?

using SparseArrays;
using LinearAlgebra;
using BenchmarkTools
using StaticArrays
#A = [1 3 3 2;2 6 9 7; -1 -3 3 4]; #"Rank-deficient and Decomposition fails"
A = [1 3 3 2;2 6 9 7; 0 -3 3 4];   #Rank is fine and decomposition successs
b = [1,2,3];

if rank(A) == size(A,1)
  println("Rank is fine and decomposition successs")
  LU = lu(A);
  @btime $A \ $b
  @btime $LU \ $b
else
  println("Rank-deficient and Decomposition fails")
  f = factorize(A);
  @btime $A \ $b
  @btime $f \ $b
end

ERROR: LoadError: DimensionMismatch("matrix is not square: dimensions are (3, 4)")
Stacktrace:
  [1] checksquare
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\LinearAlgebra.jl:224 [inlined]     
  [2] getrs!(trans::Char, A::Matrix{Float64}, ipiv::Vector{Int64}, B::Vector{Float64})
    @ LinearAlgebra.LAPACK C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\lapack.jl:1014
  [3] ldiv!
    @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\lu.jl:391 [inlined]
  [4] \(F::LinearAlgebra.LU{Float64, Matrix{Float64}}, B::Vector{Int64})
    @ LinearAlgebra C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\LinearAlgebra\src\factorization.jl:104 
  [5] var"##core#289"(LU#269::LinearAlgebra.LU{Float64, Matrix{Float64}}, b#270::Vector{Int64})
    @ Main C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:479
  [6] var"##sample#290"(__params::BenchmarkTools.Parameters)
    @ Main C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:485
  [7] _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Symbol, Integer, NTuple{4, Symbol}, NamedTuple{(:samples, :evals, :gctrial, :gcsample), Tuple{Int64, Int64, Bool, Bool}}})
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:98
  [8] #invokelatest#2
    @ .\essentials.jl:710 [inlined]
  [9] #run_result#45
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:33 [inlined]
 [10] run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Iterators.Pairs{Symbol, Integer, NTuple{5, Symbol}, NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample), Tuple{Bool, Int64, Int64, Bool, Bool}}})
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:116
 [11] #warmup#54
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:168 [inlined]
 [12] warmup(item::BenchmarkTools.Benchmark)
    @ BenchmarkTools C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:168
 [13] macro expansion
    @ C:\Users\amroa\.julia\packages\BenchmarkTools\O3UL3\src\execution.jl:565 [inlined]
 [14] top-level scope
    @ c:\Users\amroa\OneDrive - polymtl.ca\AmroAlsabbagh\Code\Julia\MANAOob\3Phase\usingLU.jl:16
in expression starting at c:\Users\amroa\OneDrive - polymtl.ca\AmroAlsabbagh\Code\Julia\MANAOob\3Phase\usingLU.jl:11

The error message (“matrix is not square”) tells you exactly what is wrong — you can’t do an LU factorization of a non-square (3x4) matrix.

(In real applications, usually you know what kind of problem you are solving in advance, so you don’t need to resort to multiple factorization branches at runtime.)

1 Like

@stevengj
Got it. It seems also I have misunderstanding of the following. What does it mean when we type A \ b as in below. it is not inv(A)*b, right?

julia> A = [1 3 3 2;2 6 9 7; -1 -3 3 4];
julia> b = [1,2,3];
julia> A \ b
4-element Vector{Float64}:
 -0.09523809523809525
 -0.28571428571428575
  0.21904761904761905
  0.3142857142857143

what is the function in julia to check if matrix is square rather than typing if size(A,1) == size(A,2)?

See the \ documentation (type ?\ in the REPL):

For rectangular A the result is the minimum-norm least squares solution computed by a pivoted QR factorization of A and a rank estimate of A based on the R factor.

In your case, you have a 3x4 matrix A, so Ax=b is an under-determined system of equations with infinitely many solutions, and as explained above it returns the solution with the smallest possible norm.

inv(A) does not exist for a non-square matrix. However, A \ b is equivalent (numerically, not computationally) to pinv(A) * b, where pinv is the pseudo-inverse. Computationally, one rarely needs to compute a matrix inverse (or pseudo-inverse) explicitly.

Well, you could type ==(size(A)...) if you want something more compact. There is also LinearAlgebra.checksquare, but that throws an error if the matrix is non-square (it’s intended for use in functions that require a square matrix.

What is your application that you don’t know ahead of time if you want an exact solution or a least-square solution (whether your problem is over/underdetermined or square/non-singular)?

You can do qr(A, val(true)) \ b if you know that you always want to use the QR minimum-norm/least-square solution (which gives the exact solution if A is square and invertible).

4 Likes

Got it. Thank you very much for you explanation.
It seems that Mat1\Mat2 is much faster than using inv(Mat1)*Mat2 (given Mat1 is square). do you agree?

julia> s = [1 2; 3 4];

julia> b= [1;2];

julia> @btime $s \ $b;
  365.534 ns (3 allocations: 304 bytes)

julia> @btime $inv(s)*$b;
  640.476 ns (7 allocations: 1.66 KiB)

Can we draw the following conclusion based on the below results? LU factorization should always gives faster execution than not factorization. Using sparse matrix is only helpful when the matrix is large.

ulia> A = [1 3 3; 6 9 7;-3 3 4];

julia> As = sparse([1 3 3; 6 9 7;-3 3 4]);

julia> b = [1,2,3];
julia> LU = lu(A);
julia> LUs = lu(As);
julia>  @btime $A \ $b;
julia>  @btime $As \ $b;
julia>  @btime $LU \ $b;
julia>  @btime $LUs \ $b;
451.269 ns (3 allocations: 384 bytes)
5.283 μs (70 allocations: 18.06 KiB)
130.135 ns (1 allocation: 112 bytes)
315.063 ns (4 allocations: 400 bytes)

Please take a look at the documentation, which should answer most of your questions.

It is generally faster and numerically better to do A\B rather than inv(A)*B. More info is available in any modern linear algebra class or book that talks about computations. If the book uses Matlab then they will explicitly use \. Edit: fix backslash

2 Likes

Thank you for your guide.

when the matrix is sparse…

1 Like

yes because doing A and B is always gonna be slower than just doing A, in this case, lu() is needed for inv():

1 Like