Parallel matrix col/row operations give incorrect results

Leo_I · July 28, 2024, 11:58am

@stevengj Thank you for the explanation.

I don’t think you update different columns in the threads. j2 in add_col! is i1 in add_colrow! , which is the same k in mod_col! in all the threads.

This also solves my question (2) here: https://discourse.julialang.org/t/how-to-improve-efficiency-of-my-linear-algebra-code-matrix-equivalency/

Here are my timings for lu decompositions:

julia> using LinearAlgebra
julia> T=Float64; n=10^4;   @time X=rand(T,n,n);
  0.373731 seconds (2 allocations: 762.939 MiB)
julia> @time L,U,p = lu(X);   X[p,:] ≈ L*U
  7.881360 seconds (14 allocations: 2.235 GiB, 0.96% gc time)
true

julia> using AbstractAlgebra
julia> T=AbstractAlgebra.GF(2); n=10^4;   @time X=AbstractAlgebra.matrix_space(T,n,n)(rand(0:1,n,n));
  3.527718 seconds (6 allocations: 2.235 GiB)
julia> @time r,d,p,L,U = fflu(X);   p = invperm([p[i] for i=1:n]);   X[p,:] == L*U
47072.787102 seconds (30 allocations: 2.980 GiB, 0.00% gc time)
true

julia> using Nemo
julia> T=Nemo.GF(2); n=10^4;   @time X=Nemo.matrix(T,n,n,rand(0:1,n,n));
 41.483070 seconds (200.00 M allocations: 5.215 GiB, 31.76% gc time)
julia> @time r,p,L,U = Nemo.lu(X);   p = invperm([p[i] for i=1:n]);   X[p,:] == L*U
 15.461073 seconds (10.16 k allocations: 4.183 GiB, 0.02% gc time)
true

Seems like Flint via Nemo.jl is very performant, though still twice slower than LAPACK and causes many unnecessary allocations.

I’m wondering, how much work would it be for me to learn how to integrate Flint into my julia library? So that it would work on ordinary Matrix instead of Nemo.matrix and so that lu! would cause 0 allocations? What would be the best place to learn this? Observing Nemo.jl code? Is there some instructional for integrating C libraries into Julia?

Also, is it possible to use Tullio.jl to speed up my naive implementation of lu!? I know that Tullio.@tullio can be used to speed up matrix multiplication over custom types.

Topic		Replies	Views
How to improve efficiency of my linear algebra code (matrix equivalency) Performance linearalgebra , memory-allocation	33	1651	July 29, 2024
Data structures for threaded computing Performance	23	2842	October 23, 2019
Performance gotcha in linear algebra lu() General Usage performance , linearalgebra	33	3616	February 11, 2020
Parallel implementation is not very effective General Usage question , performance , parallel , multithreading	19	790	October 8, 2022
Poor performance multiplying many (large) matrices multithreaded Performance question , linearalgebra	11	2476	July 13, 2020

Parallel matrix col/row operations give incorrect results

Related topics