Thanks for your feedback! I’ll add a TODO for migrating from SparseArrays.
On the performance: The Finch high-level scheduler (which handles operations like + and *) is experimental, and built for generality. We’re currently working on optimizing the high-level scheduler to use more specialized techniques as well as adding parallelism. In the meantime, it may be better to write Finch IR directly:
julia> function faster_mul(A, B)
w = Tensor(SparseByteMap(Element(0.0)))
C = Tensor(CSCFormat())
@finch begin
C .= 0
for j=_
w .= 0
for k=_, i=_; w[i] += A[i, k] * B[k, j] end
for i=_; C[i, j] = w[i] end
end
return C
end
return C
end
@time x * y; @time X * Y; @time faster_mul(X, Y);
2.599353 seconds (13 allocations: 1.663 GiB, 3.37% gc time)
14.993908 seconds (601 allocations: 20.312 GiB, 3.12% gc time, 0.00% compilation time)
4.000148 seconds (2.21 M allocations: 5.690 GiB, 2.28% gc time)