Help with this code (deallocation/pointers on sparse matrices)

So, instead of giving the full code (which still, to my surprise, doesn’t work) I want to give an analysis of single pieces, to not confuse with the little wall of text.

First: I need to multiply rectangular dense matrices with sparse quare matrices.
I thought of going with mul!, but it seems I have some issues.
If I write:

function foo()
    m=2
    # Block_Sz1 = Array{Float64}(undef, 4,2)
    Sz = [[0.5, 0] [0,-0.5]]
    A = [[0,1,0,0] [0,0,3,1]]
    Identity = [[1,0] [0,1]] 
    mul!(Block_Sz1, kronecker(Identity, Sz), A)
end
foo()

It says that Block_Sz1 is not defined.
I can define it: I tried removing the comment and it all works.
Then I tried benchmarking, but it seems that @btime doesn’t know of the existence of Block_Sz1, while @time succeds.
If I try to allocate Block_Sz1 for the sake of @btime (but still, running without using it), mul! throws a big error:

LoadError: InexactError: Int64(-0.5) (...)

Don’t know why, cause the documentation of mul! it says:
Calculates the matrix-matrix or matrix-vector product ABAB and stores the result in Y, overwriting the existing value of Y. Note that Y must not be aliased with either A or B.

However, we can proceed further cause i have sparse matrices.
Is mul! the generalized one or sparse one too?
I found on internet this Mul! vs SparseMatrix

Which says that mul! is generic, while spmatmul! is the sparse specific one, however in the sparseArrays documentation I can’t find anything related to the keyword “mul”.
We can try mul! anyway.

From the previous example I understood I first need to preallocate the array as undefined, but I can’t create an undefined sparse array, so I will go with a dense one instead.
It ends up spending 5 times the memory and 4 times the time:

function foo()
    m=2
    Block_Sz1 = Array{Float64}(undef, 4,2)
    Sz_sparse = sparse([[0.5, 0] [0,-0.5]])
    A_sparse = sparse([[0,1,0,0] [0,0,3,1]])
    @time mul!(Block_Sz1, kronecker(sparse(I, m, m), Sz_sparse), A_sparse)
end

foo()

Even If I can manage to solve these issues, there are still uncertainties:

  1. reading on internet i need to preallocate to help the compiler, and it’s obviously required. I will try to experiment on this subject after having a functional program within the julia semantics.
  2. In the end I will do a triple product like O’ A O where ’ is the transposition, A is square and sparse, O is rectangular and dense. I’ve seen there are tools like octave and tullio who could help optimizing the performances
  3. i still wonder if it’s possible, after all these steps and related to step 1, to preallocate these sparse matrices before the loop and viewing slices of them. But i think it will come at the end of all this.