Hi, I implemented a sparse syrk routine to see if I could get something faster than the naive implementation:
using SparseArrays
using LinearAlgebra
using Transducers
using Combinatorics
naive_spsyrk(s) = Symmetric(s * s')
spsyrk(s) =
zip(findnz(s)...) |>
PartitionBy(x -> x[2]) |>
MapCat(a -> with_replacement_combinations(a, 2)) |>
Map(x -> (x[1][1], x[2][1], x[1][3] * x[2][3])) |>
foldxl(TeeRF(
Map(x -> x[1])'(push!!),
Map(x -> x[2])'(push!!),
Map(x -> x[3])'(push!!),
)) |>
x -> sparse(x...) |>
Symmetric
Unsurprisingly this attempt was unsuccessful, although I did get to within a less than an order of magnitude:
julia> a = sprand(1000, 1000, 1e-3);
julia> @benchmark naive_spsyrk($a)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 22.148 μs … 1.286 ms ┊ GC (min … max): 0.00% … 95.01%
Time (median): 23.958 μs ┊ GC (median): 0.00%
Time (mean ± σ): 27.254 μs ± 45.658 μs ┊ GC (mean ± σ): 8.22% ± 4.83%
▂▆█▇▄▃▂▁ ▂
█████████▅▆█▇▅▄▁▄▄▃▃▁▁▁▃▁▁▁▁▁▄▁▅▅▅▄▃▁▁▃▁▁▁▁▁▁▁▁▁▃▅▅▆▆▆▆▆▆▇▇ █
22.1 μs Histogram: log(frequency) by time 61.6 μs <
Memory estimate: 130.81 KiB, allocs estimate: 14.
julia> @benchmark spsyrk($a)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 166.617 μs … 4.131 ms ┊ GC (min … max): 0.00% … 93.70%
Time (median): 173.007 μs ┊ GC (median): 0.00%
Time (mean ± σ): 214.630 μs ± 314.415 μs ┊ GC (mean ± σ): 15.76% ± 10.04%
█▃▂ ▁
████▇▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ █
167 μs Histogram: log(frequency) by time 2.92 ms <
Memory estimate: 547.00 KiB, allocs estimate: 5763.
But maybe some folks here have ideas on speeding this implementation up. (My personal guess is the PartitionBy
is the culprit here…)