Hi! I have a question about allocations and speed trade-off when using Sparse Arrays.
TLDR question: Sparse matrix multiplication for my case is fast, but allocates a lot of memory. Using dense matrices my code is slower but allocations are 0, indicating a trade-off between the two. In the grand scheme of things, my whole program uses a lot memory and has garbage collection time of 10%. Should I care about this large memory footprint? Can garbage collection “eat up” the savings from the sparsity?
Detailed question
I am now building a model with a lot of simple linear algebra operations on large matrices that are iterated on thousands of times. I am providing a MWE below with one operation that exemplifies it. I have the matrix multiplication X' \Sigma^{-1} .
Here \Sigma^{-1} is block-diagonal with n \times n blocks and X is T \times n and has a special structure with a lot of zeroes. The matrices have a sparsity of 99% and 95%, respectively, for a typical n. Moreover, \Sigma^{-1} is a submatrix of a bigger matrix \Sigma_{full}^{-1}, where only a handful of the first rows and columns are “deleted”. I am providing examples with typical values for T and n as well as with n=17 (spoiler @views
and mul!()
with sparse matrices do not work with that n).
My main code uses SparseArrays
and I am comparing it to a dense multiplication. I am running the following lines from the MWE below:
@btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end]; # sparse array impementation
@btime mul!($XtΣ_invD,$XsurD',$Σ_invD); # dense array implementation with mul!()
@btime mul!($XtΣ_invD,$XsurD',$Σ_inv_viewD); # dense array implementation with @view
@btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_view; # sprase array implementation with @view
@btime mul!($XtΣ_inv,$Xsur_sp',$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end]); # sprase array implementation with mul!()
Let’s look first at the timings with n = 2
n = 2, sparse array impementation: 59.300 μs (42 allocations: 478.45 KiB)
n = 2, dense matrix impementation: 83.000 μs (0 allocations: 0 bytes)
n = 2, dense matrix with view: 82.900 μs (0 allocations: 0 bytes)
n = 2, timing with sparse view: 129.793 ms (27 allocations: 395.69 KiB)
n = 2, timing with sparse mul!: 108.618 ms (16 allocations: 19.11 KiB)
On my main work computer the gap is even wider.
n = 2, sparse array impementation: 45.700 μs (42 allocations: 478.45 KiB)
n = 2, dense matrix implementation 128.400 μs (0 allocations: 0 bytes)
I will call this line between 10000 and 30000 times on each run. For 10k the sparse footprint is ~5GB. My full program shows a total time of 40.920611 seconds (25.06 M allocations: 174.057 GiB, 9.01% gc time
, so 5GB are just from this line.
No allocation but, depending on the computer, between 1,.x and 3x the time for dense arrays.
If I go for n=17, it the gain from sparse is much greater but allocs are worse - 175 MB for 1 line!
n = 17, sparse array impementation: 37.380 ms (42 allocations: 171.29 MiB)
n = 17, dense matrix impementation: 239.931 ms (0 allocations: 0 bytes)
n = 17, dense matrix with view: 223.648 ms (0 allocations: 0 bytes)
n = 17, sparse view doesn't completes on my machine
n = 17, with sparse mul!() doesn't completes on my machine
So I am wondering, should I care about these allocations? I do a lot of such manipulations and most of the code is sparse matrices, which probably explains the large footprint currently. If I bother to rewrite the code with normal matrices I expect to reduce the allocations drastically and probably the gc time, but could this be worth it for speed? I only care in the end of the total time of the program.
Last point, I don’t know if it is a bug (see this thread and the github issue there), but using @view
or mul!()
with sparse arrays is 1000 times slower for n=2. For n=17 it doesn’t compute on my machine. Not @btime
, just the line alone.
MWE:
using LinearAlgebra
using SparseArrays
using Revise
using BenchmarkTools
# -------------------------------------------
# this code simply generates the necessary matrices
Tf = 242; p = 5; T=Tf-p; n = 2;
X = rand(T,n*p+1);
repX = kron(X,ones(n,1));
r,c = size(X);
idi = repeat(1:r*n,inner=c);
idj=repeat(1:c*n,r);
Σhatt_inv_sp = randn(n,n);
Σ_invsp_full = kron(sparse(Matrix(1.0I, Tf, Tf)),Σhatt_inv_sp);
# -------------------------------------------
# main matrix No. 1
Xsur_sp = sparse(idi,idj,vec(repX'));
# main matrix No. 2
Σ_invsp = Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# alternative as a view
Σ_invsp_view = @view Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# the line in question using sparse matrices
XtΣ_inv = Xsur_sp'*Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# timing
print("n = 2, sparse array impementation:")
@btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# -------------------------------------------
# alternative with dense matrices
# -------------------------------------------
XsurD = Matrix(Xsur_sp);
Σ_inv_full = Matrix(Σ_invsp_full);
XtΣ_invD = Matrix(XtΣ_inv);
Σ_invD = Matrix(Σ_invsp);
Σ_inv_viewD = @view Σ_inv_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# the line in question with dense matrices
mul!(XtΣ_invD,XsurD',Σ_invD);
# timing
print("n = 2, dense matrix impementation:")
@btime mul!($XtΣ_invD,$XsurD',$Σ_invD);
# timing with the view
print("n = 2, dense matrix with view:")
@btime mul!($XtΣ_invD,$XsurD',$Σ_inv_viewD);
# -------------------------------------------
# Side note: mul! with sparse matrices just hangs forever for n=17
print("n = 2, timing with sparse view:")
@btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_view;
print("n = 2, timing with sparse mul!:")
@btime mul!($XtΣ_inv,$Xsur_sp',$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end]);
# n = 17
# -------------------------------------------
# this code simply generates the necessary matrices
Tf = 242; p = 5; T=Tf-p; n = 17;
X = rand(T,n*p+1);
repX = kron(X,ones(n,1));
r,c = size(X);
idi = repeat(1:r*n,inner=c);
idj=repeat(1:c*n,r);
Σhatt_inv_sp = randn(n,n);
Σ_invsp_full = kron(sparse(Matrix(1.0I, Tf, Tf)),Σhatt_inv_sp);
# -------------------------------------------
# main matrix No. 1
Xsur_sp = sparse(idi,idj,vec(repX'));
# main matrix No. 2
Σ_invsp = Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# alternative as a view
Σ_invsp_view = @view Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# the line in question using sparse matrices
XtΣ_inv = Xsur_sp'*Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# timing
print("n = 17, sparse array impementation:")
@btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# -------------------------------------------
# alternative with dense matrices
# -------------------------------------------
XsurD = Matrix(Xsur_sp);
Σ_inv_full = Matrix(Σ_invsp_full);
XtΣ_invD = Matrix(XtΣ_inv);
Σ_invD = Matrix(Σ_invsp);
Σ_inv_viewD = @view Σ_inv_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end];
# the line in question with dense matrices
mul!(XtΣ_invD,XsurD',Σ_invD);
# timing
println("n = 17, dense matrix impementation:")
@btime mul!($XtΣ_invD,$XsurD',$Σ_invD);
# timing with the view
print("n = 17, dense matrix with view:")
@btime mul!($XtΣ_invD,$XsurD',$Σ_inv_viewD);
# Side note: mul! with sparse matrices just hangs forever for n=17
print("n = 17, timing with sparse view never completes on my machine\n")
# @btime $XtΣ_inv = $Xsur_sp'*$Σ_invsp_view;
print("n = 17, timing with sparse mul!() never completes on my machine")
# @btime mul!($XtΣ_inv,$Xsur_sp',$Σ_invsp_full[(Tf-T)*n+1:end, (Tf-T)*n+1:end]);