Why the performance varies greatly after @view?

zhangjin · October 27, 2021, 4:27pm

I focus on the following code’s efficient, and I use the TimerOutputs to measure the performance.

using TimerOutputs
const to = TimerOutput()
abstract type AbstractSN end
    struct spi <: AbstractSN
        M::Int8
        Ks::Int8
        D::Int16
        Ds::Int16 
        pq_codebook::Array{<:AbstractFloat, 3} 
        pq_codes::Array{<:Integer, 2} 

        k_v::Int64
        vq_codebook::Array{<:AbstractFloat, 2}
        vq_codes::Array{<:Integer, 1} 
        metric::String
        id_index_table::Dict{Int64,Vector{Int64}}
    end

    function search_neighbors(spi::AbstractSN, q::Vector{<:AbstractFloat}, num_centroid_tosearch::Int, topk::Int)
        @timeit to "coarse_search" index,scores_v = coarse_search(spi, q, num_centroid_tosearch)
        @timeit to "id_indexed" ids,scores_vq = id_indexed(spi, index, scores_v)                

        @timeit to "compute_table" lookuptable = compute_table(spi.pq_codebook, q) # (M,Ks)
        @timeit to "ex1" pq_codes_ids = spi.pq_codes[ids,:]
        @timeit to "compute_scores_" scores_pq = compute_scores_(lookuptable, pq_codes_ids)
        @timeit to "ex2" scores = scores_vq + scores_pq;                
        
        @timeit to "ex3" len_s = length(scores);
        @timeit to "ex4" if topk > len_s
            i_ = maxk!(scores, len_s)
            ids_ = ids[i_]
            ids_ = [ids_;repeat([0],topk - len_s)] 
        else
            i_ = maxk!(scores, topk)
            ids_ = ids[i_]        
        end
        return ids_
    end

show(to)

You may don’t know the implementation of each methods above. To ease your reading burden, let’s look at the time evaluation first, I will fill in the details later.

I notice I should have used @view in the line “ex1”.

After I use the @view, there was an unexpected result.

I don’t know why the time of "compute_scores_ " increases so much.

The following is the codes of compute_scores_ and id_indexed

    function compute_scores_(lookuptable::Array{Float64,2}, pq_codes::AbstractArray{Int,2})
        n, M = size(pq_codes);
        scores = zeros(n);
        for i = 1:n
            s_ = 0;
            for j = 1:M
                @inbounds s_ = s_ + lookuptable[j, pq_codes[i,j]];
            end
            scores[i] = s_;
        end
        return scores    
    end

    function id_indexed(spi::AbstractSN, index::SubArray{Int64, 1}, scores_v::Vector{<:AbstractFloat})
        ids = [] 
        scores_vq = []
        for (i,s) in zip(index,scores_v)
            id_i = spi.id_index_table[i]  # spi.id_index_table::Dict{Int64,Vector{Int64}}
            ids = [ids;id_i]
            len = length(id_i)
            scores_vq = [scores_vq;repeat([s],len)]
        end
        ids = convert(Array{Int64,1}, ids)
        scores_vq = convert(Array{Float64,1}, scores_vq)
        return ids, scores_vq
    end

If you have any other suggestions for improving performance, please let me know.
Thanks in advance for any help/suggestions.

mbauman · October 27, 2021, 4:51pm

This isn’t terribly surprising to me. The difference between regular indexing and a view is that the former immediately copies all the selected rows into a new array whereas the latter just lazily holds onto the indices and does the lookup when it needs to.

You still need to do the lookup at some point. How much of an effect this has depends upon what kind of indices you use, your access patterns, and more. In this case, you’re selecting rows out of a huge array with a (potentially unsorted) list of arbitrary indices. This is likely causing all sorts of cache misses — and it’s those cache misses that are likely responsible for the 25 seconds worth of runtime whether you do it immediately or lazily.

Are the pq_codes_ids sorted? Could you sort them?

zhangjin · October 27, 2021, 5:23pm

@mbauman How can I sort pq_codes_ids ? I try to sort the ids, which seems not to work.

    function search_neighbors(spi::AbstractSN, q::Vector{<:AbstractFloat}, num_centroid_tosearch::Int, topk::Int)
        @timeit to "coarse_search" index,scores_v = coarse_search(spi, q, num_centroid_tosearch)
        @timeit to "id_indexed" ids, scores_vq = id_indexed(spi, index, scores_v)  

        ix_ = sortperm(ids)
        ids = ids[ix_]
        scores_vq = scores_vq[ix_]        

        # ids = 1:length(scores_vq)  # It will help reduce compute_scores_ time
        # ids = convert(Array{Int64,1}, ids)

        @timeit to "compute_table" lookuptable = compute_table(spi.pq_codebook, q) # (M,Ks)
        @timeit to "ex1" pq_codes_ids = @view spi.pq_codes[ids,:]
        @timeit to "compute_scores_" scores_pq = compute_scores_(lookuptable, pq_codes_ids)
        ...
    end

Topic		Replies	Views
When should you use views? Performance	9	2943	June 10, 2019
Performance of views vs. explicit index referencing Performance	2	501	July 7, 2020
Confused about performance interaction between @view and logical (Boolean) indexing Performance	7	1448	April 15, 2019
Why operation on return of @views take a long time Performance	7	188	September 6, 2024
Using views slows down my code New to Julia question , performance	5	467	October 23, 2020

Why the performance varies greatly after @view?

Related topics