So, it turns out there are a number of ways to write code in Julia that checks if a scalar is or is not within a vector.
See benchmark repo.
Exploring Efficient not in
Operations in Julia
Introduction
Julia provides multiple ways to implement a not in
(\notin) operation, which checks whether a scalar is not present in a vector. While the syntax !(scalar in vector)
is straightforward, there are alternative methods like using any
, all
, and manual loops, each with potential trade-offs in performance.
I decided to benchmark different implementations to evaluate their efficiency under various scenarios. In this post, I will:
- Present the methods I tested.
- Outline the benchmarking process I followed.
- Seek advice from the Julia community on potential improvements or overlooked methods.
Methods Tested
Here are the five approaches I tested:
!(node in route)
: The standardnot in
operation in Julia.!any(isequal(node), route)
: Usesany
to check for equality, then negates the result.all(x -> x != node, route)
: Verifies that all elements in the vector are not equal to the scalar.- Manual
for
loop: Iterates through the vector and breaks early if a match is found. - SIMD-optimized
for
loop: A manual loop using the@simd
macro for potential vectorization and faster execution.
Benchmarking Setup
The benchmarking script:
- Generates random-sized vectors (
routes
) containing integers between 1 andNB_VERTICES
. - Tests each method against random scalar values (
nodes
). - Measures execution time using the BenchmarkTools.jl package over multiple iterations.
Key Parameters
NB_VERTICES
: Total number of unique vertices.NB_ROUTES
: Number of routes (vectors) to generate.MAX_ROUTE_SIZE
: Maximum size of each vector.ITERATIONS
: Number of benchmarking repetitions.
Discussion Points
- Are there any better approaches to write a
not in
operation that I may have missed? - For small and large vectors, which method should theoretically be the most efficient? Why?
- How would SIMD optimizations (
@simd
) impact this operation in practice?
I need to optimize this operation for some custom Dynamic-Programming algorithms.
Request for Feedback
I’d love to hear your thoughts on:
- The choice of methods I benchmarked.
- Additional ideas for writing efficient
not in
operations in Julia. - Any relevant insights into optimizing this operation, particularly with respect to memory and computational efficiency.
Feel free to share your expertise or suggest any improvements to my benchmarking approach. Let’s discuss the best way to handle such operations in Julia! You can review my code, and propose improvements on the GitHub repo
Results
Here are results with a decent amount of iterations:
Method: !in Results: Mean: 4.741224284129747e-7 Std: 1.905167918928099e-8
Method: any Results: Mean: 4.74334978316611e-7 Std: 1.5981341335909413e-8
Method: all Results: Mean: 4.7388496802150037e-7 Std: 1.7295353118322162e-8
Method: for Results: Mean: 4.747417947615649e-7 Std: 1.7384824081541754e-8
Method: sfor Results: Mean: 4.759807047715669e-7 Std: 1.9178525661399665e-8
So it seems there is not really any significant difference between those alternative writings.