Problem
Does Julia have an efficient way of constructing a huge sparse matrix from a given list of entries u,v,w
, some of which can have the same locations u,v
, and in that case, their weights w
must be summed. Thus u,v,w
are input vectors and I wish to create a sparse matrix that has value w[i]
at position u[i],v[i]
. For instance, the Mathematica code
n=10^6; m=500*n;
u=RandomInteger[{1,n},m];
v=RandomInteger[{1,n},m];
w=RandomInteger[{-9, 9}, m]; AbsoluteTiming[
SetSystemOptions["SparseArrayOptions"->{"TreatRepeatedEntries"->1}];
a= SparseArray[{u,v}\[Transpose] -> w, {n,n}]; ]
requires 135sec and 60GB of RAM. The equivalent Python code
import scipy.sparse as sp
import numpy as np
import time
def ti(): return time.perf_counter()
n=10**6; m=500*n;
u=np.random.randint(0,n,size=m);
v=np.random.randint(0,n,size=m);
w=np.random.randint(-9,10,size=m); t0=ti();
a=sp.csr_matrix((w,(u,v)),shape=(n,n),dtype=int); t1=ti(); print(t1-t0)
needs 36sec and 20GB, but doesn’t support (2). The equivalent Julia code
using SparseArrays;
m=n=10^6; r=500*n;
u=rand(1:m,r);
v=rand(1:n,r);
w=rand(-9:9,r);
function f() a=sparse(u,v,w,m,n,+); end;
@time f()
needs 155sec and 26GB (the @time macro incorrectly reports that it used only 15GB).
Is there a way to make this construction more efficient?
Are there any Julia packages that are better at this?
Background
I have created a package for linear algebra and homological algebra computations. I did it in Mathematica, since the SparseArray
implementation (0) there is
- very efficient (fast and low RAM usage),
- supports exact fractions as matrix entries.
- supports polynomials as matrix entries.
However, it is not
- parallelized
- open source
- GPU or cluster enabled.
I am considering migrating to a different programming language. I have tried Python SciPy.sparse
, but it doesn’t satisfy (2). I’m not sure if C++ libraries support (2). As a test, I tried sorting an array of floats of size 10^9 and compared the performance of Mathematica, Python numpy
, and Julia ThreadsX
. I was incredibly impressed by the latter (3x faster than numpy, 10x than Mathematica). I am considering migrating to Julia, but I wish to first make sure that my library would perform better than in Mathematica.
P.S. I also asked this question here.