Switching two elements in array, vcat getting flagged as runtime-dispatch?

I currently have the following line of code in a simulation:

function exchangeBits(n::BitVector, i::Int, j::Int)
    # Returns a copy of n with the ith and jth bits exchanged
    if i < j
        m = n[vcat(1:(i - 1), j, i + 1:(j - 1), i, j + 1:end)]
    else
        m = n[vcat(1:(j - 1), i,  j + 1:(i - 1), j, i + 1:end)]
    end
    return m
end

Running the full simulation through @profview gives a Flamegraph with a substantial amount of space taken up by this function, and it looks like the following:

The way I understand this is that the function, in particular the usage of vcat, is somehow not type-stable and that I should find an alternative method. Is this correct? If so, what exactly is the problem with vcat in this context?

If I inspect the @code_warntype it claims to be type stable. But from the @btime below, I’d believe it’s actually unstable like the profiler suggests. EDIT: The internals of vcat appear to be unstable in this situation. It looks like a loop in LinearAlgebra._vcat is to blame.

In any case, I’ll recommend this much faster version:

function exchangeBits2(n::BitVector, i::Int, j::Int)
    m = copy(n)
    m[j],m[i] = m[i],m[j] # swap entries i and j
    return m
end
julia> using BenchmarkTools

julia> z = rand(500) .< 0.5;

julia> @btime exchangeBits($z,$10,$20);
  6.440 μs (108 allocations: 7.64 KiB)

julia> @btime exchangeBits2($z,$10,$20);
  39.697 ns (2 allocations: 160 bytes)

Dispatch aside, the big issue with the vcat-indexing version is that vcat builds a new array. You then use this new array to index n to make yet-another array. This is terribly wasteful. It’s also expensive to index a BitVector this way because of how it’s represented in memory (as individual bits, which take some effort to extract from the bytes that store them). Copying the whole thing and making the two little changes you want is much easier.

Opened as issue #48850. That issue currently can be avoided via this definition that replaces i and j in the vcat calls with i:i and j:j, so that there aren’t mixtures of Number and UnitRange arguments:

function exchangeBits1(n::BitVector, i::Int, j::Int)
    # Returns a copy of n with the ith and jth bits exchanged
    if i < j
        m = n[vcat(1:(i - 1), j:j, i + 1:(j - 1), i:i, j + 1:end)]
    else
        m = n[vcat(1:(j - 1), i:i,  j + 1:(i - 1), j:j, i + 1:end)]
    end
    return m
end
julia> @btime exchangeBits1($z,$10,$20);
  1.240 μs (8 allocations: 4.38 KiB)

Note that this is still much slower than the suggested exchangeBits2 given above.

1 Like