Sparse matrices copy arrays, do not view them

I wrote this program, which preallocates the csc arrays.
I have to do a lot of kronecker products and i wanted to re-use this arrays and to write in them.
Each iteration would have a different @view inside them, but if I use @view, it gives me an error.

N::Int32 = 4
m::Int32 = 2

nzval_Sz = Vector{Int8}(undef, N)
nzval_Sz[1] = 1
nzval_Sz[2] = -1

rowval_Sz = Vector{Int32}(undef, N)
rowval_Sz[1] = 1
rowval_Sz[2] = 2

colptr_Sz = Vector{Int32}(undef, N+1)
colptr_Sz[1] = 1
colptr_Sz[2] = 2
colptr_Sz[3] = 3 #last one is tot_nnz+1

sparsematrix = SparseMatrixCSC(m, m, colptr_Sz[1:m+1], rowval_Sz[1:m], nzval_Sz[1:m])

nzval_Sz[1] = 22

display(sparsematrix)

I tried to see if they were making an automatical view of these vectors, but after changing a value of nzval, the matrix didn’t change.

How can i make views of these vectors?
Can I use array instead of vector, or does it lead to internal conversions?
Why did they choose vectors?

related unanswered questions:

P.S: If i write my own sparsematrixcsc subtype, just like i did fo some matrices before, do I need to rewrite most of the operations, or do they already work using views etc?

If I want to save the kronecker product:

result = kronecker(sparse_Identity, sparsematrix)

not as a standalone sparse matrix, but instead re-using sparsematrix itself, is it possible, or is it going to dodge what i write, to instead do what he wants?

related question:

I don’t actually understand your question, but one thing that might help clarify, you say:

but:

julia> Vector{Int32}
Vector{Int32} (alias for Array{Int32, 1})

Vector{T} is just Array{T, 1}, i.e. an Array with the dimension parameter set to 1. So saying “can I use an array instead of a vector” doesn’t really make sense, unless you are asking for an array with specific dimensions (e.g. “can I use an Array{Int32, 2}” - which is a Matrix - or Array{Int32, 7})

I heard somewhere that it would lead to a conversion, thus my confusion regarding the topic (I asked to stack overflow before asking here, cause i didn’t want to flod this place with questions, but it seems it’s more accurate to just ask here to not incur in false informations)

You cannot really reuse the memory of a sparse matrix - especially not when the result has much more entries.

You can change the internals of a SparseMatrixCSC directly if you wish - however that might be dangerous if you don’t truly know what you are doing. So I’d recommend not doing that. It looks like:

sparsematrix.nzval[1] = 22

If you know the coordinates of the value you want to change, just do it directly:

sparsematrix[1,1] = 22

This does not allocate a new matrix (or anything at all) if there was a value at that place before hand.

By the way:

sparsematrix = SparseMatrixCSC(m, m, colptr_Sz[1:m+1], rowval_Sz[1:m], nzval_Sz[1:m])

Here you explicitely copy the matrices colptr_Sz, rowval_Sz and nzval_Sz. Slicing in Julia copies! You could try again with @views in front. This macro converts every slicing operation to a view instead:

@views sparsematrix = SparseMatrixCSC(m, m, colptr_Sz[1:m+1], rowval_Sz[1:m], nzval_Sz[1:m])

I esplicitely changed an element of those arrays, to see if sparsematrixcsc was copying them or not.

I tried with @views, but it still syas:

ERROR: LoadError: MethodError: no method matching SparseMatrixCSC(::Int32, ::Int32, ::SubArray{Int32, 1, Vector{Int32}, Tuple{UnitRange{Int64}}, true}, ::SubArray{Int32, 1, Vector{Int32}, Tuple{UnitRange{Int64}}, true}, ::SubArray{Int8, 1, Vector{Int8}, Tuple{UnitRange{Int64}}, true})

Closest candidates are:
  SparseMatrixCSC(::Integer, ::Integer, ::Vector, ::Vector, ::Vector)
   @ SparseArrays ~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/SparseArrays/src/sparsematrix.jl:34
  SparseMatrixCSC(::Any, ::Any, ::SparseArrays.ReadOnly, ::SparseArrays.ReadOnly, ::Vector)
   @ SparseArrays ~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/SparseArrays/src/sparsematrix.jl:47
  SparseMatrixCSC(::UniformScaling, ::Integer, ::Integer)
   @ SparseArrays ~/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/share/julia/stdlib/v1.10/SparseArrays/src/sparsematrix.jl:2190

About

You cannot really reuse the memory of a sparse matrix - especially not when the result has much more entries.

I thought of preallocating the array, so that it has still space to accomodate the result. This would lead in less allocations.

Note that I’m not going to append values at the end of the slicing, still inside the preallocated array. Instead I need to rewrite all the values of the array (if we have just 1 kronecker product and a preallocated array of 4 elements. If we have 1 kronceker product and the preallocated array was of 8 elements, then i don’t need to rewrite all its elements, but just half of them).

I just wanted some stuff to do it in place, but if it needs a temporary array, what I’m doing (avoiding excessive allocations) could be meaningless

Ok that is an unfortunate error because the constructor expects proper arrays. You could use resize! to shrink the arrays instead. This does not change the underlying memory, so when the code uses resize! again to make room for new elements, no actual memory needs to be allocated. More Julia would be to use sizehint! (manual entry) instead.

N::Int32 = 4
m::Int32 = 2
nzval_Sz = [1,-1]
rowval_Sz = [1,2]
colptr_Sz = [1,2,3]#last one is tot_nnz+1

# this could cause allocations because space is preallocated
Base.sizehint!(nzval_Sz, N)
Base.sizehint!(rowval_Sz, N)
Base.sizehint!(colptr_Sz, N+1)

sparsematrix = SparseMatrixCSC(m, m, colptr_Sz, rowval_Sz, nzval_Sz)

nzval_Sz[1] = 22 # now changes the contents of sparsematrix

Most operations with SparseArrayCSC are not in-place. This whole preallocation buisness done above only works for operations where SparseArrayCSC uses resize! (or push! or similar) to increase the size of the vectors, which is the case for setting the value at an index but not much else I think.

I still think you are worrying too much about performance details before you have an actual functioning code. I think a good approach to (numerical) computing is

  1. Make it work (i.e. have something that produces some output)
  2. Make it right (i.e. the output is correct)
  3. Make it fast

in that order.

1 Like

This tells you all of the problem: Sparse matrices expect Vector. This is not a case of overly restrictive dispatch – this is really expected in the sparse matrix type.

Now julia Vector is in some sense a glorified pointer. So you can do the following:

julia> sparsematrix = SparseMatrixCSC(m, m, unsafe_wrap(Array, pointer(colptr_Sz), (m+1,)) , unsafe_wrap(Array, pointer(rowval_Sz), (m,)) , unsafe_wrap(Array, pointer(nzval_Sz), (m,)))
2×2 SparseMatrixCSC{Int8, Int32} with 2 stored entries:
 1   ⋅
 ⋅  -1

julia> nzval_Sz[1] = 22
22

julia> display(sparsematrix)
2×2 SparseMatrixCSC{Int8, Int32} with 2 stored entries:
 22   ⋅
  ⋅  -1

Use of unsafe_wrap implies that you are now responsible for memory management. This especially means that you must make sure to keep colptr_Sz, rowval_Sz and nzval_Sz alive until the last access to the wrapped array (on pain of use-after-free bugs). Furthermore, you cannot safely resize these arrays anymore until the last access to the wrapped array (that would realloc their memory, leading to the same kind of use-after-free bug).

Otherwise, this should behave like a view for most purposes (including the small allocation for the wrapper object).

PS. It might be possible to punt on the memory management by installing a finalizer on the wrapped array that keeps the parent alive. Does anyone here know whether that (1) works in practice and (2) works in spec (i.e. is officially supported)? The docs for finalizer already suggest that finalizers on arrays must throw, but that is probably a bug in the docs and not the implementation?

1 Like

Is it then possible to overwrite the contents of these arrays, which are the storage of SparseMatrix (and in our case only m elements of these N-element-arrays are used) with the result of kronecker (which creates an N-element-array) without temporary arrays?

If it’s not possible, then, even if this is kind of useful because we are not duplicating the inputs, I don’t think it gives me a real advantage as I need to reach really big matrices after multiple kronecker products (and consequent multiple internal allocations)

I already have an algorithm which works, so I’m just testing the different options, before creating something that is hard to rewrite from the beginning.

However I agree with those points. It’s just an easy NRG algorithm the one I’m implementing. I would then pass to tensor networks, so before doing it I would like to consolidate my Julia basis

You can look at what kron does when called with SparseMatrixCSC:

julia> using LinearAlgebra, SparseArrays;
julia> sparsematrix = sprand(5,5,0.1); 
julia> @less kron(sparsematrix, sparsematrix) # opens text viewer at code that would run

There we can see this code:

function kron(A::AbstractSparseMatrixCSC, B::AbstractSparseMatrixCSC)
    mA, nA = size(A)
    mB, nB = size(B)
    mC, nC = mA*mB, nA*nB
    Tv = typeof(oneunit(eltype(A))*oneunit(eltype(B)))
    Ti = promote_type(indtype(A), indtype(B))
    C = spzeros(Tv, Ti, mC, nC)
    sizehint!(C, nnz(A)*nnz(B))
    return @inbounds kron!(C, A, B)
end

This:

  1. computes the size of the new matrix,
  2. creates a new SparseMatrixCSC and uses sizehint! to allocate the appropriate size and then
  3. uses kron! to perform the kronecker product into the newly allocated array.

This is interesting because you can of course allocate the output array yourself and call kron! to overwrite it. Note that this exclamation mark ! at the of function names indicates that this functions mutates (some of) its inputs and you need to take special care when using them! Most importantly, in this case you cannot reuse one of the input arrays as output array like kron!(A,B,A). So you need to allocate a 3rd array.

1 Like

If you already have a working code, you should profile it to see which part is slowest and optimize that. There is no point in optimizing a part of the program that makes up only 1% of the runtime (because you can at most gain 1% of runtime).

If you need help with profiling, or interpreting the results or optimizing the relevant parts of the code, you are welcome to post those things here and usually a couple of people will give you helpful comments on how to optimize the hell out of your code snippet :smiley:

1 Like

If I know I have to do x kronecker products, I could preallocate a single array with the sole purpose of storing all the temporary arrays. Thus allocating 1 time the biggest one (and slicing/viewing it for the various iterations), instead of allocating and garbage collecting x temporary arrays.

Is it faster? How can I say to him to use the preallocated one (viewing a slice [1:m] of it)?

( I need to see how kronecker of Kronecker.jl works and if I could do it the exact same way I would do it for kron)

Example:

I need to define:

Sparse_matrix

And:

A = kronecker(Identity, sparse_matrix) 

In the actual algorithm there are even other operations involved

Now A has double the dimensions of sparse matrix. In the next iteration I need to do:

B = kronecker(Identity, A) 

That means that I’m no more using sparsematrix.
That also means that if I allocate 2 big arrays and view into them, I could not spend time in new allocation and garbage collections

Thanks, I will do it later in this month. I still need to understand how to profile well. Up until now I’m just judging the complexity of the various algorithms inside the program and doing it manually, but I know I need to switch to a proper method. I just don’t have time right now.

This only a measurement can tell you. Thus I keep recommending to actually implement something before wasting hours on improving something that might not need improvement (and the risk your “improvement” actually turns out worse!).

You don’t do that directly. You just create an empty SparseMatrixCSC and use sizehint! to tell it the maximal size it will need. Then you keep using that and it won’t allocate until you exceed that size.

If your matrices are non-tiny, I think you should think really hard about whether materializing kronecker products is a productive way forward.

The amount of data stored in C = kron(A,B) is the product of the amounts of data stored in A and B; but in reality you only need the sum. So you have just pissed away an entire polynomial order of complexity. It would be simple to write a wrapper matrix type that lazily evaluates the kronecker – and that would probably be faster for read-access (memory transfers more important than compute).

So this depends very much on what you plan to do with the kron products!

So whatever you want to do with C, think hard whether that thing can work directly on A and B. Maybe ask around here. Maybe take another look at why you thought kronecker product was the right way forward – people who prove theorems often write something convenient, with the implicit understanding that nobody would ever implement that in this stupid way (e.g. Leibniz formula for determinants with a sum over all permutations, lol)…

Apologies if you already conclusively know that you must materialize the kron.

1 Like

There are 2 ways:

Kronecker for matrices
Outer product for tensors.

I’m now using the first way.

I will provide the naïve program in few hours

Outer products have the same issue.

Take the example where you need to work with projections onto a subspace. That is, the linear map x -> x - v * (v' * x) for some unit vector v.

This is a fine linear operator, and in most cases you would be stupid to represent that as a matrix LinearAlgebra.Diagonal(fill(1.0, length(v))) - v * v'. Not just because of construction speeds, but for everything.

(it makes sense for small known dimensions using StaticArrays, though – and there surely are other settings where you want to represent that as a matrix. Just think very hard whether that is good!)

1 Like

I’ve wrote this simplified piece of code, but I’m stuck, as julia has no pointers like C,C++

using SparseArrays
using LinearAlgebra
using Kronecker

#Definition of starting parameters
Delta::Float64 = 0
max_states::Int16 = 10 
NIter::Int32 = 2 #Final lattice size: 2*NIter + 2
init_rowsize::Int32 = 2

# Sz = Array{Float64}(undef, 2, 2)
# Sp = Array{Float64}(undef, 2, 2)
# Sm = Array{Float64}(undef, 2, 2)

Sz = [[0.5, 0] [0, -0.5]]
Sp = [[0, 0] [1.0, 0]]
Sm = [[0, 1.0] [0, 0]]
Ham = zeros(2,2)
# If I first preallocate them, and then check their type, the last statement overwrites the preallocation. 
#I would like to use this syntax and to control their type. Didn't found anything. Same for Ham.

#In C I would have wrote these Block_stuff as pointers ,see in the loop
Block_Sp = Sp
Block_Sm = Sm
Block_Sz = Sz
Block_Ham = Ham

for i = 1:max_states
    actual_rowsize = 2^i

    #Hamiltonian constructor
    H_super = sparse( kroneckersum(Block_Ham, Block_Ham) .- Delta*kronecker(Block_Sz, Block_Sz) .+ 0.5*(kronecker(SBlock_Spp, Block_Sm) .+ kronecker(Block_Sm, Block_Sp)) )
    H_dense_super = collect(H_super) #I make a dense matrix as this is a dummy example with full diagonalization.

    #Higher dimension kron
    Block_Ham = H_super
    Block_Sz = sparse(kronecker(sparse(I, actual_rowsize, actual_rowsize), Block_Sz))
    Block_Sp = sparse(kronecker(sparse(I, actual_rowsize, actual_rowsize), Block_Sp))
    Block_Sm = sparse(kronecker(sparse(I, actual_rowsize, actual_rowsize), Block_Sm))

end

Now I’ve made Block_“stuff” ambigous, cause I globally associated it with “stuff”
What I wanted to do is to create a pointer to this stuff, create a higher dimensional “stuff” with kron, and change the address of the pointers to the new “stuff”.
This way I’m able to use this pointer inside the construction of H_super iteratively

I don’t know if there is a good way or if the clearest way is to preallocate 2 big sparse arrays for each “stuff” and go with them, without removing them from memory

P.S: the initial Ham is a 0 matrix, but doing the kronecker product out of 0 matrix leds to an error.

You should think of the mathematical notion of the Kronecker product as a means to compose a linear operator without forming it.

Remember, kronecker(A,B) * x does not do kron(A,B)*x but reshape(B * reshape(x,n,n) * A', n*n,1) which is much cheaper in time and space. Make sure you understand why.

I see two distinct approaches here:

  1. Build a sparse matrix. The kron of sparse matrices is sparse, so if you make the initial arrays Sz, etc, sparse, then the result will be sparse. But sparse forces the formation of the Kronecker product, so do not do sparse(kronecker()) use simply kron

  2. Build a hierarchy of kronecker operators. Because kronecker does not form the Kronecker product, you do get what you want with the pointers to matrices that cannot be done with approach 1. See

julia> A = ones(2,2)
2×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0

julia> M = kronecker(A,A)
4×4 Kronecker.KroneckerProduct{Float64, Matrix{Float64}, Matrix{Float64}}:
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

julia> A .= 2  
2×2 Matrix{Float64}:
 2.0  2.0
 2.0  2.0

julia> M
4×4 Kronecker.KroneckerProduct{Float64, Matrix{Float64}, Matrix{Float64}}:
 4.0  4.0  4.0  4.0
 4.0  4.0  4.0  4.0
 4.0  4.0  4.0  4.0
 4.0  4.0  4.0  4.0

Finally, initialize your matrices as sparse, so the result will be sparse. Do not use sparse within the for loop.

The main issue is that, written A, I need to create a new_A using A.
In addition new_A is of different dimension than A.

I didn’t really understand your point 2, even if I understand that kronecker products produce an expression: not allocated stuff. However I’m writing a simple example. In general I would like to diagonalize it, so I need it fully allocated.

I would have created something like this, but i didn’t run it. just to sketch the idea. Don’t know if it’s functional.

#Definition of constant
NIter::Int32 = 3
init_N::Int32 = 2 #inizial size of matrices
N::Int32 = init_N ^ NIter #final size of matrices

#sparsization of Sz
nzval_Sz = Vector{Int8}(undef, N)
nzval_Sz[1] = 0.5
nzval_Sz[2] = -0.5
rowval_Sz = Vector{Int32}(undef, N)
rowval_Sz[1] = 1
rowval_Sz[2] = 2
colptr_Sz = Vector{Int32}(undef, N+1)
colptr_Sz[1] = 1
colptr_Sz[2] = 2
colptr_Sz[3] = 3 #last one is tot_nnz+1

#definition of temporary arrays
nzval_Sz_temp = Vector{Int8}(undef, N)
rowval_Sz_temp = Vector{Int32}(undef, N)
colptr_Sz_temp = Vector{Int32}(undef, N+1)


for i = 1:NIter
    m = 2^i
    m_next = 2^(i+1)

    if i%2 != 0
        Sz = SparseMatrixCSC(m, m, unsafe_wrap(Array, pointer(colptr_Sz), (m+1,)) , unsafe_wrap(Array, pointer(rowval_Sz), (m,)) , unsafe_wrap(Array, pointer(nzval_Sz), (m,)))
        #H_super(Sz)  
        Sz_temp = SparseMatrixCSC(m_next, m_next, unsafe_wrap(Array, pointer(colptr_Sz), (m_next+1,)) , unsafe_wrap(Array, pointer(rowval_Sz), (m_next,)) , unsafe_wrap(Array, pointer(nzval_Sz), (m_next,)))    
        Sz_temp .= kronecker(sparse(I, m, m), Sz) #I'm writing on Sz_temp arrays #that means that next Sz I need to use is Sz_temp
    else
        Sz_temp = SparseMatrixCSC(m, m, unsafe_wrap(Array, pointer(colptr_Sz), (m+1,)) , unsafe_wrap(Array, pointer(rowval_Sz), (m,)) , unsafe_wrap(Array, pointer(nzval_Sz), (m,)))    
        #H_super(Sz_temp)
        Sz = SparseMatrixCSC(m_next, m_next, unsafe_wrap(Array, pointer(colptr_Sz), (m_next+1,)) , unsafe_wrap(Array, pointer(rowval_Sz), (m_next,)) , unsafe_wrap(Array, pointer(nzval_Sz), (m_next,)))
        Sz .= kronecker(sparse(I, m, m), Sz_temp) #I'm writing on Sz arrays #that means that next Sz I need to use is Sz
    end

end

This way I alternated the use of the temporary arrays, given the iteration index being odd or even.

A clearer version is the following, but it requires to somehow delete some data (I still don’t know how it’s done on julia and can’t find anything regarding manual deallocation of stuff)

#definition of constants
NIter::Int32 = 3
init_N::Int32 = 2

#sparsization of Sz
nzval_Sz = Vector{Int8}(undef, init_N)
nzval_Sz[1] = 0.5
nzval_Sz[2] = -0.5
rowval_Sz = Vector{Int32}(undef, init_N)
rowval_Sz[1] = 1
rowval_Sz[2] = 2
colptr_Sz = Vector{Int32}(undef, init_N+1)
colptr_Sz[1] = 1
colptr_Sz[2] = 2
colptr_Sz[3] = 3

Sz = SparseMatrixCSC(init_N, init_N, colptr_Sz, rowval_Sz, nzval_Sz) #copy of Sz

for i = 1:NIter
    m = 2^i
    Block_Sz = Sz #copy of the data
    #somehow delete Sz
    #H_super(Block_Sz)
    Sz = kronecker(sparse(I, m, m), Block_Sz) #definition of new Sz
    #somehow delete Block_Sz
end

however it involves a lot of allocations and deallocation if NIter is big.
The data involved is of the same order, as there are always 2 sparse matrices at the same time.