Exploring Efficient not in Operations in Julia: Benchmarking Different Methods

0nyr · January 13, 2025, 3:14pm

So, it turns out there are a number of ways to write code in Julia that checks if a scalar is or is not within a vector.

See benchmark repo.

Exploring Efficient `not in` Operations in Julia

Introduction

Julia provides multiple ways to implement a not in (\notin) operation, which checks whether a scalar is not present in a vector. While the syntax !(scalar in vector) is straightforward, there are alternative methods like using any, all, and manual loops, each with potential trade-offs in performance.

I decided to benchmark different implementations to evaluate their efficiency under various scenarios. In this post, I will:

Present the methods I tested.
Outline the benchmarking process I followed.
Seek advice from the Julia community on potential improvements or overlooked methods.

Methods Tested

Here are the five approaches I tested:

!(node in route): The standard not in operation in Julia.
!any(isequal(node), route): Uses any to check for equality, then negates the result.
all(x -> x != node, route): Verifies that all elements in the vector are not equal to the scalar.
Manual for loop: Iterates through the vector and breaks early if a match is found.
SIMD-optimized for loop: A manual loop using the @simd macro for potential vectorization and faster execution.

Benchmarking Setup

The benchmarking script:

Generates random-sized vectors (routes) containing integers between 1 and NB_VERTICES.
Tests each method against random scalar values (nodes).
Measures execution time using the BenchmarkTools.jl package over multiple iterations.

Key Parameters

NB_VERTICES: Total number of unique vertices.
NB_ROUTES: Number of routes (vectors) to generate.
MAX_ROUTE_SIZE: Maximum size of each vector.
ITERATIONS: Number of benchmarking repetitions.

Discussion Points

Are there any better approaches to write a not in operation that I may have missed?
For small and large vectors, which method should theoretically be the most efficient? Why?
How would SIMD optimizations (@simd) impact this operation in practice?

I need to optimize this operation for some custom Dynamic-Programming algorithms.

Request for Feedback

I’d love to hear your thoughts on:

The choice of methods I benchmarked.
Additional ideas for writing efficient not in operations in Julia.
Any relevant insights into optimizing this operation, particularly with respect to memory and computational efficiency.

Feel free to share your expertise or suggest any improvements to my benchmarking approach. Let’s discuss the best way to handle such operations in Julia! You can review my code, and propose improvements on the GitHub repo

Results

Here are results with a decent amount of iterations:

Method: !in Results:  Mean: 4.741224284129747e-7 Std: 1.905167918928099e-8
Method: any Results:  Mean: 4.74334978316611e-7 Std: 1.5981341335909413e-8
Method: all Results:  Mean: 4.7388496802150037e-7 Std: 1.7295353118322162e-8
Method: for Results:  Mean: 4.747417947615649e-7 Std: 1.7384824081541754e-8
Method: sfor Results: Mean: 4.759807047715669e-7 Std: 1.9178525661399665e-8

So it seems there is not really any significant difference between those alternative writings.

Oscar_Smith · January 13, 2025, 3:20pm

If this is a workload you care about, you really shouldn’t be using a Vector. a Set or tree-based structure will do these operations way way way faster.

StefanKarpinski · January 13, 2025, 3:52pm

I’m relieved by this conclusion. These should all be equivalent unless one of them is not getting optimized appropriately.

mtfishman · January 13, 2025, 5:00pm

For a hash-based approach with a nice interface, see AcceleratedArrays.jl.

sumiya11 · January 13, 2025, 9:32pm

How would SIMD optimizations (@simd) impact this operation in practice?

If only using Vectors, this implementation seems faster by 30-40% than !(x in X) for length 1000 on my PC:

sum(X .== x) == 0

Probably due to easier SIMD. This does two passes over data, so I expect a hand-written loop to gain another 2x.

0nyr · January 14, 2025, 10:47am

I guess you are speaking about when X is a Vector of Vectors or Matrix. I will test the not in in this situation later.

So for now, I have tested:

function method_1(node, route)
    !(node in route)
end

function method_2(node, route)
    !any(isequal(node), route)
end

function method_3(node, route)
    all(x -> x != node, route)
end

function method_4(node, route)
    for r in route
        if r == node
            return false
        end
    end
    return true
end

function method_5(node, route)
    @simd for r in route
        if r == node
            return false
        end
    end
    return true
end

function method_6(node, route)
    return sum(route .== node) == 0
end

function method_7(node, route)
    sum = 0
    @simd for r in route
            sum += r == node
    end
    return sum == 0
end

Where I name each method according to:

METHODS::Vector{Tuple{String, Function}} = [
    ("!in", method_1),
    ("any", method_2),
    ("all", method_3),
    ("for", method_4),
    ("sfor", method_5),
    ("sum", method_6),
    ("ssum", method_7),
]

And I get:

Method: !in Mean: 4.76043449380637e-7 Std: 1.671772030037151e-8
Method: any Mean: 4.748514428584657e-7 Std: 1.8416113584003793e-8
Method: all Mean: 4.773872523225068e-7 Std: 1.565677259120496e-8
Method: for Mean: 4.775637474296272e-7 Std: 1.840971015276048e-8
Method: sfor Mean: 4.771978918301575e-7 Std: 1.8409102121552966e-8
Method: sum Mean: 9.987918e-5 Std: 5.949062057266853e-6
Method: ssum Mean: 4.768339770688275e-7 Std: 1.918197393282483e-8

Which looks like this:

So again, all methods are equivalent, except for sum method sum(route .== node) == 0 which is clearly slower.

DNF · January 14, 2025, 12:14pm

You get the same vectorization, without allocation, by doing

sum(==(x), X)

or, even a bit faster (for some reason)

count(==(x), X)

Early bailout with any is good when x is present, but will ruin simd. A hand-vectorized early bailout version is probably optimal.

julia> @btime sum($x .== $X);
  110.394 ns (3 allocations: 224 bytes)

julia> @btime count(==($x), $X);
  62.245 ns (0 allocations: 0 bytes)

This is with X of length 1024.

foobar_lv2 · January 14, 2025, 2:57pm

It’s also relevant to note that your benchmark code is invalid in the sense of PSA: Microbenchmarks remember branch history

This doesn’t matter for your specific code, because it’s not branch-heavy. But you absolutely need a different benchmarking setup.

sumiya11 · January 14, 2025, 3:07pm

Thanks for checking! I imagine GC has an effect on aggregate timings, since method_6 allocates. To me, that is expected, but the extent of the slowdown is a bit surprising (200x, I guess?).

I wonder what timings you get for

using BenchmarkTools
X = rand(Int, 1000);
x = rand(Int);
@btime method_1($x, $X);
@btime method_6($x, $X);

On my laptop:

  601.136 ns (0 allocations: 0 bytes)
  152.242 ns (3 allocations: 224 bytes)

sumiya11 · January 14, 2025, 3:13pm

julia> @btime count(==($x), $X);
  81.612 ns (0 allocations: 0 bytes)

Ah, the llvm ir is so neat, nice! (EDIT: LOL. Not IR, assembly, too little sleep, sorry.)

.LBB0_6:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
        vpcmpeqq        ymm5, ymm2, ymmword ptr [rdx + 8*rax + 8]
        vpsubq  ymm0, ymm0, ymm5
        vpcmpeqq        ymm5, ymm2, ymmword ptr [rdx + 8*rax + 40]
        vpsubq  ymm1, ymm1, ymm5
        vpcmpeqq        ymm5, ymm2, ymmword ptr [rdx + 8*rax + 72]
        vpsubq  ymm3, ymm3, ymm5
        vpcmpeqq        ymm5, ymm2, ymmword ptr [rdx + 8*rax + 104]
        vpsubq  ymm4, ymm4, ymm5
        add     rax, 16
        cmp     rsi, rax
        jne     .LBB0_6

foobar_lv2 · January 14, 2025, 3:43pm

FWIW, @DNF’s proposal:

julia> function block_in(needle, haystack)
       found = 0
       len = length(haystack)
       for i=1:(len>>6)
       start = 1 + (i-1)<<6
       stop = min(len, i <<6)
       @simd for j=start:stop
       @inbounds straw = haystack[j]
       found += needle==straw
       end
       found > 0 && return true
       end
julia> r=zeros(Int, 100_000); @btime in(1, $r); @btime block_in(1, $r); @btime count($(==(1)), $r)
  29.678 μs (0 allocations: 0 bytes)
  13.516 μs (0 allocations: 0 bytes)
  11.892 μs (0 allocations: 0 bytes)

Topic		Replies	Views
Some_thing in some_list New to Julia question	11	489	January 29, 2024
Julia's in.() seems slow compared to R's %in% Performance	40	3019	May 9, 2019
Testing whether value is (or not) in array General Usage question	4	538	April 23, 2020
If condition with a vector comparison New to Julia question	7	263	June 26, 2024
Possibility to check `∈` and `∉` in the `===` sense Internals & Design	5	659	July 24, 2018