Ah I see now, can you specify how Ik changes?
If Ik .+= i is true than you don’t need a for loop, even if it is in the time-domain. hmm actually not entirely sure
As for reducing the number of times you need to create sparse matrices you can operate on it in place like how it was suggested earlier in this discussion:
for iter = 1:100
@. F.nzval = iter * F.nzval + rand() # update nonzero entries in-place
end
So now you have two matrices that you want to update in place, one that is the final F matrix you want to return (which gets rotated every timestep), and one that only gets its values updated.
julia> include("sparse.jl") # Original code
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 127.978 μs … 8.009 ms ┊ GC (min … max): 0.00% … 91.08%
Time (median): 282.130 μs ┊ GC (median): 0.00%
Time (mean ± σ): 330.082 μs ± 556.243 μs ┊ GC (mean ± σ): 17.04% ± 9.76%
█ ▆
███▇▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▁▂▂ ▂
128 μs Histogram: frequency by time 4.84 ms <
Memory estimate: 489.38 KiB, allocs estimate: 1910.
julia> include("sparse.jl") # Improved code
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 94.017 μs … 1.798 ms ┊ GC (min … max): 0.00% … 82.30%
Time (median): 131.914 μs ┊ GC (median): 0.00%
Time (mean ± σ): 148.972 μs ± 102.671 μs ┊ GC (mean ± σ): 5.10% ± 7.64%
▇▇▅▅▅██▆▅▄▃▂▂▁ ▁▁▁▂▂▂▂▂▂▂▁▁ ▃
███████████████████████████████▇▇▇▆▆▆▆▆▁▅▅▅▅▅▆▃▅▅▅▅▅▅▄▅▅▅▁▁▄▄ █
94 μs Histogram: log(frequency) by time 508 μs <
Memory estimate: 377.45 KiB, allocs estimate: 729.
Also is there any particular reason why you are using sparse matrices? For the dimension size you have given us, dense matrices might give a faster result. At the very least for the F matrix, which seems to be a dense matrix instead of a sparse matrix.
That also gives me a considerable speed up in the computation.
Thank you very much for your efforts. It worked in this example, but it did not in my real code. I will check where is the problem.
In may real code, R is 33x33 and F is 102x33. O is 102x33
You can update your example with the required dimensions, by making fake matrices with random values. For example, the below works and has the same dimensions as your original problem.
row_dim = 102
col_dim = 33
R = sparse(2:col_dim, 1:col_dim-1, 1, col_dim, col_dim)
size = row_dim * col_dim
O_density = 6 / size
F_density = 38 / size
O = sprand(row_dim, col_dim, O_density)
F = sprand(row_dim, col_dim, F_density)
for i = 1:100
O.nzval .+= i
F .= F * R .+ O
end
Note, I’m using density values based on the nonzeros values in your above examples.
I believe that the density is the key, even if you have a low enough matrix density you can still use sparse matrices at a more performant level than dense matrices. Even below the 128x128 range.
In this problem, if the number of nonzeros is the same as in the example code provided and in larger dimensions case, then using sparse matrices is still faster at newer dimensions (102 x 33).
However, if you use the same density (notice that in the example code F is quite dense) than using dense matrices is faster.
@Oscar_Smith@stevengj@Jeff_Emanuel@acxz Thank you very much for your efforts. Could you help me here please?
I found the causing problem. In my real code, I calculated O then update F2. However, as can be seen O is not as S. In O.nzval .= Ik; the order of filling is based on column-based while it is row-based in S . Consequently, F1 is not as F2.
Is there a way to add indices to O.nzval to correct the filling in O.nzval .= Ik;?
Change this to Nh = [6, 6, 6, 7, 7, 8] When using .nzval it returns the nonzeros values of the array based on the order of the row and col indices, not based on the original order you passed in.
Instead of using the .nzval field directly, it may be safer to use the nzrange function and iterate. See: Sparse Arrays · The Julia Language
Using sort is incorrect, sort will not generalize.
Is there another way?
If you want to use the .nzval field (or the nonzeros function) directly you need to make sure that Ik is in the order of how it is represented internally, which is in the CSC ordering, instead of the input CSR ordering you used to create the sparse matrix. (I think this is true, you can make sure by checking the source code: https://github.com/JuliaLang/julia/blob/742b9abb4dd4621b667ec5bb3434b8b3602f96fd/stdlib/SparseArrays/src/sparsematrix.jl or just try assigning the nzval with CSC ordering and if it works it work!)
using this iteration in my time-domain for-loop is expensive.
Can you be more quantitative? What are the time differences? I would be surprised if the performance is drastically hurt bottled.
Can you also comment on the density of the matrices that you are working with, specifically the F and O matrices? This information along with the dimensions will help us understand which is the appropriate data structure to use.