Hi all,
Pretty straightforward – I’d like to write some code that performs a matvec between a matrix A
and a vector x
, except I know ahead of time that I don’t need all of the rows of A
. In other words, instead of performing A*x
, I only really need to compute A[rows,:] * x
.
In practice, though, the former is much, much faster to compute than the latter:
julia> using BenchmarkTools
julia> W = randn(256, 256);
julia> rows = findall(randn(256) .≥ 0); # random subset of ≈ 1/2 of the rows of W
julia> @benchmark W*x setup=(x=randn(256))
BenchmarkTools.Trial:
memory estimate: 2.13 KiB
allocs estimate: 1
--------------
minimum time: 7.619 μs (0.00% GC)
median time: 9.685 μs (0.00% GC)
mean time: 17.447 μs (35.49% GC)
maximum time: 62.014 ms (99.85% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark W[rows,:]*x setup=(x=randn(256))
BenchmarkTools.Trial:
memory estimate: 267.28 KiB
allocs estimate: 5
--------------
minimum time: 86.810 μs (0.00% GC)
median time: 98.906 μs (0.00% GC)
mean time: 118.478 μs (9.77% GC)
maximum time: 62.002 ms (99.51% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark view(W,rows,:)*x setup=(x=randn(256)) # Try using view to avoid copying data?
BenchmarkTools.Trial:
memory estimate: 1.25 KiB
allocs estimate: 4
--------------
minimum time: 118.659 μs (0.00% GC)
median time: 124.240 μs (0.00% GC)
mean time: 127.680 μs (0.72% GC)
maximum time: 9.343 ms (98.37% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark( # Try @inbounds?
begin
@inbounds W[rows,:]*x
end,
setup=(x=randn(256))
)
BenchmarkTools.Trial:
memory estimate: 271.28 KiB
allocs estimate: 5
--------------
minimum time: 94.149 μs (0.00% GC)
median time: 97.437 μs (0.00% GC)
mean time: 115.970 μs (9.84% GC)
maximum time: 61.165 ms (99.49% GC)
--------------
samples: 10000
evals/sample: 1
Unfortunately, for my application I can’t simply store W[rows,:]
in a variable since rows
changes frequently.
I’ve tried other things to speed this computation up, e.g. storing V = Matrix(W')
and performing V[:,rows]' * x
instead of W[rows,:] * x
so that I’m indexing into columns rather than rows, and this gives a decent speedup:
julia> V = Matrix(W')
julia> @benchmark W[rows,:]*x setup=(x=randn(256))
BenchmarkTools.Trial:
memory estimate: 271.28 KiB
allocs estimate: 5
--------------
minimum time: 81.037 μs (0.00% GC)
median time: 100.251 μs (0.00% GC)
mean time: 117.961 μs (10.63% GC)
maximum time: 62.720 ms (99.51% GC)
--------------
samples: 10000
evals/sample: 1
julia> @benchmark V[:,rows]'*x setup=(x=randn(256))
BenchmarkTools.Trial:
memory estimate: 271.34 KiB
allocs estimate: 8
--------------
minimum time: 47.782 μs (0.00% GC)
median time: 64.861 μs (0.00% GC)
mean time: 82.015 μs (15.47% GC)
maximum time: 62.494 ms (99.56% GC)
--------------
samples: 10000
evals/sample: 1
Still, it’s way slower than computing W*x
.
Does anybody have any suggestions for how to speed up this computation? I would greatly appreciate any input you can provide!
Thanks!