@stevengj Thank you for the explanation.
I don’t think you update different columns in the threads.
j2
inadd_col!
isi1
inadd_colrow!
, which is the samek
inmod_col!
in all the threads.
This also solves my question (2) here: https://discourse.julialang.org/t/how-to-improve-efficiency-of-my-linear-algebra-code-matrix-equivalency/
Here are my timings for lu decompositions:
julia> using LinearAlgebra
julia> T=Float64; n=10^4; @time X=rand(T,n,n);
0.373731 seconds (2 allocations: 762.939 MiB)
julia> @time L,U,p = lu(X); X[p,:] ≈ L*U
7.881360 seconds (14 allocations: 2.235 GiB, 0.96% gc time)
true
julia> using AbstractAlgebra
julia> T=AbstractAlgebra.GF(2); n=10^4; @time X=AbstractAlgebra.matrix_space(T,n,n)(rand(0:1,n,n));
3.527718 seconds (6 allocations: 2.235 GiB)
julia> @time r,d,p,L,U = fflu(X); p = invperm([p[i] for i=1:n]); X[p,:] == L*U
47072.787102 seconds (30 allocations: 2.980 GiB, 0.00% gc time)
true
julia> using Nemo
julia> T=Nemo.GF(2); n=10^4; @time X=Nemo.matrix(T,n,n,rand(0:1,n,n));
41.483070 seconds (200.00 M allocations: 5.215 GiB, 31.76% gc time)
julia> @time r,p,L,U = Nemo.lu(X); p = invperm([p[i] for i=1:n]); X[p,:] == L*U
15.461073 seconds (10.16 k allocations: 4.183 GiB, 0.02% gc time)
true
Seems like Flint via Nemo.jl is very performant, though still twice slower than LAPACK and causes many unnecessary allocations.
I’m wondering, how much work would it be for me to learn how to integrate Flint into my julia library? So that it would work on ordinary Matrix
instead of Nemo.matrix
and so that lu!
would cause 0 allocations? What would be the best place to learn this? Observing Nemo.jl code? Is there some instructional for integrating C libraries into Julia?
Also, is it possible to use Tullio.jl to speed up my naive implementation of lu!
? I know that Tullio.@tullio can be used to speed up matrix multiplication over custom types.