According to my benchmarks, it looks like it is much faster for the in-place shiftleft!
, but still x3 faster for the out-of-place shiftleft
. Also, my approach looks like is almost not affected by the percentage of non-zero values, vs. the other that do are affected.
julia> using SparseArrays, LinearAlgebra, BenchmarkTools
julia> function shiftleft!(A::SparseMatrixCSC, n)
sz = size(A)
n <= sz[2] || throw(DimensionMismatch("$(n) bigger than the number of columns $(size[2])"))
colnz = A.colptr[n+1] - 1
@view(A.colptr[begin:end-n]) .= @view(A.colptr[begin+n:end])
A.colptr[end-n+1:end] .= A.colptr[end]
A.colptr .-= colnz
deleteat!(A.rowval, 1:colnz)
deleteat!(A.nzval, 1:colnz)
A
end
shiftleft! (generic function with 1 method)
julia> shiftleft(A, n) = shiftleft!(copy(A), n)
shiftleft (generic function with 1 method)
julia> shiftleft_mul(A, n) = A * spdiagm(-n => ones(Int, size(A, 2)-n))
shiftleft_mul (generic function with 1 method)
julia> function shiftleft_slide(A::SparseMatrixCSC, n)
A[:, begin:begin+n-1] .= zero(eltype(A))
dropzeros!(A)
A[:, vcat(begin+n:end, begin:begin+n-1)]
end
shiftleft_slide (generic function with 1 method)
Results in
julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 803 evaluations.
Range (min β¦ max): 150.450 ns β¦ 1.189 ΞΌs β GC (min β¦ max): 0.00% β¦ 66.28%
Time (median): 164.403 ns β GC (median): 0.00%
Time (mean Β± Ο): 193.630 ns Β± 89.107 ns β GC (mean Β± Ο): 3.87% Β± 8.50%
βββββββββββββββββββ ββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ
β
ββββββ β
150 ns Histogram: log(frequency) by time 697 ns <
Memory estimate: 816 bytes, allocs estimate: 1.
julia> @benchmark shiftleft!(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 800 evaluations.
Range (min β¦ max): 145.001 ns β¦ 2.450 ΞΌs β GC (min β¦ max): 0.00% β¦ 84.63%
Time (median): 167.564 ns β GC (median): 0.00%
Time (mean Β± Ο): 214.467 ns Β± 130.594 ns β GC (mean Β± Ο): 4.92% Β± 8.98%
βββββββββββββββββββ β
ββββββββββββββββββββββββββ
ββββββ
β
β
β
βββββββββ
β
β
ββ
β
β
βββββββ
ββββ β
145 ns Histogram: log(frequency) by time 860 ns <
Memory estimate: 816 bytes, allocs estimate: 1.
julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min β¦ max): 1.769 ΞΌs β¦ 297.860 ΞΌs β GC (min β¦ max): 0.00% β¦ 96.98%
Time (median): 2.857 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 4.596 ΞΌs Β± 10.683 ΞΌs β GC (mean Β± Ο): 11.58% Β± 5.31%
ββββββ
ββββββββ β βββ β
βββββββββββββββββββββ
β
β
β
β
ββββββββββββββββββββββ
βββββ
β
β
ββ
ββ
β
β
1.77 ΞΌs Histogram: log(frequency) by time 20.2 ΞΌs <
Memory estimate: 15.80 KiB, allocs estimate: 4.
julia> @benchmark shiftleft(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 11.614 ΞΌs β¦ 808.363 ΞΌs β GC (min β¦ max): 0.00% β¦ 94.41%
Time (median): 14.659 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 18.201 ΞΌs Β± 35.230 ΞΌs β GC (mean Β± Ο): 10.51% Β± 5.41%
ββ
βββββββββββββ ββββββ β
ββββββββββββββββββββββββββββββββ
ββββββββ
βββββββββββ
βββββββββ β
11.6 ΞΌs Histogram: log(frequency) by time 48.1 ΞΌs <
Memory estimate: 140.52 KiB, allocs estimate: 6.
julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 17.919 ΞΌs β¦ 1.694 ms β GC (min β¦ max): 0.00% β¦ 95.68%
Time (median): 22.169 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 25.795 ΞΌs Β± 36.031 ΞΌs β GC (mean Β± Ο): 3.66% Β± 2.65%
β
ββββββ
ββββ
ββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββ β
17.9 ΞΌs Histogram: frequency by time 41.7 ΞΌs <
Memory estimate: 25.09 KiB, allocs estimate: 19.
julia> @benchmark shiftleft_mul(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 51.419 ΞΌs β¦ 938.537 ΞΌs β GC (min β¦ max): 0.00% β¦ 86.56%
Time (median): 55.294 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 62.974 ΞΌs Β± 43.053 ΞΌs β GC (mean Β± Ο): 4.06% Β± 5.75%
βββββ
β
ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββ
ββ
β
β
ββββ
βββββββ
ββββ β
51.4 ΞΌs Histogram: log(frequency) by time 127 ΞΌs <
Memory estimate: 246.69 KiB, allocs estimate: 23.
julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.1))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min β¦ max): 4.884 ΞΌs β¦ 230.295 ΞΌs β GC (min β¦ max): 0.00% β¦ 88.43%
Time (median): 5.887 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 6.586 ΞΌs Β± 8.344 ΞΌs β GC (mean Β± Ο): 5.40% Β± 4.16%
βββββββββ
β
βββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ β
4.88 ΞΌs Histogram: frequency by time 9.59 ΞΌs <
Memory estimate: 14.62 KiB, allocs estimate: 4.
julia> @benchmark shiftleft_slide(A, 10) setup=(A=sprand(100, 100, 0.9))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 27.227 ΞΌs β¦ 927.842 ΞΌs β GC (min β¦ max): 0.00% β¦ 90.21%
Time (median): 30.218 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 37.314 ΞΌs Β± 37.987 ΞΌs β GC (mean Β± Ο): 4.89% Β± 4.92%
βββββ
β
β
ββββββββββββββββββββββββ ββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββ
β
β
β
27.2 ΞΌs Histogram: log(frequency) by time 79 ΞΌs <
Memory estimate: 126.72 KiB, allocs estimate: 6.