Question on performance of views

In some of our code we use views based on integer indexing (we need to remove some colums from our matrices). I noticed that these views have very poor performance in multiplication.

a = rand(1000, 1000)
@btime a*a 
# 13 ms

av1 = @view a[1000:1000]
@btime av1*av1 
# 13 ms

av2 = @view a[Not(12), Not(12)]
@btime av2*av2 
# 733 ms

@btime Matrix(av2)*Matrix(av2) 
# 19 ms

I think its because the view with Integer indices / inverted indices is not a StridedArray anymore…

Is there a smarter way to remove certain columns/rows from my matrices and still have good multiplication performance? Right now the best way is to explicitly turn them into a “regular” Matrix before multiplying.

I looked at StridedViews but it looks like it only works on ranges and not on indices/inverted indices. Does someone have some advise? Should I just make the copy with Matrix or is there some package that can solve this for me? :slight_smile:

InvertedIndices.jl (Not) has some embarrassingly bad performance. Like it’s the kind of thing that randomly makes me wince and go red in the face when I think about it. But this is not entirely that. This is just plain old indirections and generic — not BLAS optimized — linear algebra.

It’s exactly the same as:

julia> using BenchmarkTools

julia> a = rand(1000, 1000);

julia> av3 = @view a[[begin:11; 13:end], [begin:11; 13:end]];

julia> @btime $av3*$av3;
  303.146 ms (8 allocations: 7.64 MiB)

julia> @btime $(Matrix(av3))*$(Matrix(av3));
  8.790 ms (2 allocations: 7.61 MiB)

Copying (or not-view-ing) is a great solution.

3 Likes

Do you know if this is inherrent to what Not wants to acomplish, or if there’s room for improvement?

1 Like

While copying to to a (possible preallocated) new place is certainly the easiest and most flexible option, in simple cases (like the one you showed) you could also try to multiply the blocks instead:

function square_with_removed_colrow_II(mat, index)
    reduced_mat = @view mat[Not(index), Not(index)]
    return reduced_mat*reduced_mat
end

function square_with_removed_colrow_II_copy(mat, index)
    reduced_mat = Matrix(@view mat[Not(index), Not(index)])
    return reduced_mat*reduced_mat
end

@views function square_with_removed_colrow_blockwise(mat, index)
    blockindex1 = 1:index-1
    blockindex2 = index+1:size(mat,1) # A should be square for this code
    # blocks
    A = mat[blockindex1, blockindex1]
    B = mat[blockindex1, blockindex2]
    C = mat[blockindex2, blockindex1]
    D = mat[blockindex2, blockindex2]

    res = zeros(size(mat,1)-1, size(mat,2)-1)
    r1 = blockindex1 # index ranges in the resulting matrix
    r2 = blockindex2 .- 1

    # compute blockwise
    mul!(res[r1,r1], A, A)
    mul!(res[r1,r1], B, C, true, true) # the true, true means that the result of the mul is added onto the existing values
    mul!(res[r1,r2], A, B)
    mul!(res[r1,r2], B, D, true, true)
    mul!(res[r2,r1], C, A)
    mul!(res[r2,r1], D, C, true, true)
    mul!(res[r2,r2], C, B)
    mul!(res[r2,r2], D, D, true, true)
    return res
end
julia> @btime square_with_removed_colrow_II($a, $12);
  529.318 ms (11984 allocations: 8.00 MiB)

julia> @btime square_with_removed_colrow_II_copy($a, $12);
  8.859 ms (11980 allocations: 15.58 MiB)

julia> @btime square_with_removed_colrow_blockwise($a, $12);
  9.936 ms (2 allocations: 7.61 MiB)

But as the benchmarks show, this is actually a bit slower than copying once.

1 Like

No, it’s not fundamental. It’s trying to be one step too clever at avoiding allocations a-la logical indices. Fundamentally, that should be possible but it’s hard. That said, it’s currently slower than the naive copy thing, IIRC. It’s been a long time since I’ve looked at it.