I’m trying to understand why the SparseSuite \ is so much worse than what I get with BandedMatrices.
My test case, which reflects the results of my application, uses two matrices with the same elements, but stored differently.
So I run the solvers FVS\b and FVB\b and I get this
julia> @btime $FVB\$b;
2.226 ms (11 allocations: 1.37 MiB)
julia> @btime $FVS\$b;
22.556 ms (73 allocations: 25.04 MiB)
I figured the generic spase solve would be worse, but not that much worse. Is there anything I shoud be doing that I’ve missed? Is there a direct call to a function inside SparseSuite that could help?
SparseMatrices are inherently slower because they don’t really work well with things like simd, and they also cause branch prediction issues. Both of those are really big deals.
and seems to work very will inside Matlab, but my Matlab experience with it was on pretty small problems and I was only comparing it to the previous sparse solvers in Matlab.
It doesn’t matter how good you are if you have no information. If you can specialize on properties of the system, you can do better. SuiteSparse is as good as a completely generic sparse algorithm can get, but completely sparse generic is too generic. Even good sparse algorithms work by trying to add structure.
Fair enough. SuiteSparse has always done what I’ve asked it to do and been fast enough. This is the first time I’ve done a comparison like this however.
I’ve never compared the LAPACK band solver vs SuiteSparse on the same problem. In hindsight, I should have expected this. Not disappointed in SparseSuite and I knew band solvers are fast. I did not know that the difference would be so large on what seems to be an easy problem for both.
I tried this in Matlab (same computer and OS). Matlab does some analysis before a solve (looking for spd, bands, … ) and is smart enough to use the band solver and hence is as fast as Julia. The timings (assuming you believe tic and toc) are almost identical.
>> n=20000; em2=rand(n,1); em1=rand(n,1); em0=rand(n,1);
>> e1=rand(n,1); e2=rand(n,1);
>> A=spdiags([em1 em2 em0 e1 e2], -2:2, n,n);
>> b=rand(n,1);
>> tic; c=A\b; toc
Elapsed time is 0.002861 seconds.
>> tic; [l,u]=lu(A); toc
Elapsed time is 0.005762 seconds.
I do not know how to force Matlab to use SuiteSparse. Mathworks hired Tim Davis as a consultant to get it in there, so it would be interesting to see how/if they to better than Julia.
Where should I open that issue? Base? SuiteSparse?
There’s some subtlety happening. If you want to do qr! or lu!, you need to allocate a few extra bands. If all you’re doing is backslash, then that extra room needs some management.
Winner = no surprise
Margin of victory = some surprise
I’m moving a lot of my work from Matlab → Julia and finding new things all the time. Most of what I see is well explained if I RTFM. This one was bit different.
I wonder how much it might cost to detect the “bandedness” and reorganize the matrix. In other words, is the detection and reorganization done after a solve is called, or before. If the latter, this cost is not part of the timing.
The bandedness cost would be O(n) (check each col’s greatest and least row element). The conversion is O(nd+nnz) since you need to allocate a zero banded matrix and fill in the non-zero entries.