How to speed up creating a sparse matrix?

I am attempting to create a sparse matrix from an array of arrays called S . My code is this:

function make_matrix(nostates, S)
    M = spzeros(Int32, nostates, nostates)
    convert(SparseMatrixCSC{UInt32, UInt32}, M)
    for colno in tqdm(1:nostates)
        for rowno in 1:length(S)
            if S[rowno][colno] > 0
                M[ colno, S[rowno][colno]] += 1
            end
        end
    end
    return M
end

S has 4 rows and 179765 columns. This code takes around 10 minutes for some reason. Is there something obvious I can do to see why?

Here is some type information:

typeof(S)
Array{Array{Int32,1},1}

nostates is 179765

It is therefore doing 719060 operations on the sparse matrix so something must be really slow currently.

Yes, if you think about the way a SparseMatrixCSC is stored it becomes clear that inserting new non-zero entries is quite expensive (in general requires moving a lot of data). Doing it almost a million times then becomes very expensive.

The way you do it is by creating three vectors containing all the non zero rows, columns and values respectively and then calling sparse on that in the end

julia> I = [1, 4, 3, 5]; J = [4, 7, 18, 9]; V = [1, 2, -5, 3];

julia> S = sparse(I,J,V)
5×18 SparseMatrixCSC{Int64,Int64} with 4 stored entries:
  [1,  4]  =  1
  [4,  7]  =  2
  [5,  9]  =  3
  [3, 18]  =  -5

As an unrelated remark, you have:

typeof(S)
Array{Array{Int32,1},1}

which means that S is a vector of vectors. You probably want to make that a Matrix instead.

2 Likes

Should probably be

M = convert(SparseMatrixCSC{UInt32, UInt32}, M)

(just a side node)