# Efficient Initialization of huge sparse arrays

There are two ways one can initialize a NXN sparse matrix, whose entries are to be read from one/multiple text files. Which one is faster ? I need the more efficient one, as N is large, typically 10^6.

1. I could store the (x,y) indices in arrays `x`, `y`, the entries in an array `v` and declare
`K = sparse(x,y,value);`
2. I could declare
`K = spzeros(N)`
then read of the (i,j) coordinates and values v and insert them as
`K[i,j]=v;`

2 Likes

Don’t insert values one by one: that will be tremendously inefficient since the storage in the sparse matrix needs to be reallocated over and over again.

6 Likes

Thanks, that is exactly the information I needed.

You can also use BenchmarkTools.jl to verify this:

``````julia> using SparseArrays

julia> using BenchmarkTools

julia> I = rand(1:1000, 1000); J = rand(1:1000, 1000); X = rand(1000);

julia> function fill_spzeros(I, J, X)
x = spzeros(1000, 1000)
@assert axes(I) == axes(J) == axes(X)
@inbounds for i in eachindex(I)
x[I[i], J[i]] = X[i]
end
x
end
fill_spzeros (generic function with 1 method)

julia> @btime sparse(\$I, \$J, \$X);
10.713 μs (12 allocations: 55.80 KiB)

julia> @btime fill_spzeros(\$I, \$J, \$X);
96.068 μs (22 allocations: 40.83 KiB)
``````
4 Likes

Just for completeness, if you need to build/update sparse matrices repeatedly, have a look also at the parent method `SparseArray.sparse!` (not exported)

EDIT: incidentally, you can even avoid allocating, building and processing your `x`, `y`, `value` arrays if you can generate your matrix nonzeros ordered by column. Then, you can build a `SparseMatrixCSC` directly by generating its internal `colptrs`, `rowvals` and `nzval` fields efficiently. Not sure if you want to mess with such details, though I think you can gain quite a bit of efficiency this way

3 Likes

Thank you for the painstaking effort !

Thanks for the tip !