# XOR when indexing arrays

In a tutorial about computer hardware (https://viralinstruction.com/posts/hardware/), I came across the following Julia code:

``````function foo(x)
s = zero(eltype(x))
@inbounds for i in eachindex(x)
s = x[i ⊻ s]
end
return s
end;
``````

What is the point of the author’s frequent use of the XOR operator when indexing arrays?
The result of `i ⊻ s` is that number whose bits are 1 if the corresponding bits of `i` and `s` are different, 0 otherwise. I just don’t see the connection to using this as an index…

On another note, why does he need to use `eltype(x)` when he just wants to use a 0 for indexing an array?

`x` is intended to be an array, `eltype(x)` gets the type of `x`’s elements, `zero(eltype(x))` makes a zero with that type. The `foo([UInt8(1)])` would then start with `s = UInt8(0)`.

A literal `0` could only have the type `Int`. That would make `s` have >1 concrete type throughout the method, aka a type instability that the compiler has to make more instructions to handle. 2 types doesn’t hurt performance that much, but techniques like `zero(eltype(x))` are common in order to achieve type stability as much as possible.

I think it’s only to generate instructions to demonstrate. Assuming `x` is a `UInt` vector that starts with `UInt(1)`, we always start with `s = Uint8(0)` and `i = 1`, thus `i ⊻ s == 1` and `s = x`. The next iteration starts with `s = UInt(1)` and `i = 2`, thus `i ⊻ s == 3` and `s = x`. There you can see that problem that if we only had 2 elements to begin with, we will index out of bounds. The `@inbounds` macro removes bounds-check error handling and is intended to improve performance when the coder knows they can’t go out of bounds. This method can easily access memory unsafely, I can’t see a practice use for it.

2 Likes

Ping @jakobnissen

I don’t know about the posted code (which might be a toy example?), but there are other practical algorithms that do complicated-looking xor-based calculations to repeatedly update array indices in a loop, e.g. CRC32 checksums.

3 Likes

It definitely helps that the table has 256 elements and the index is always a `UInt8` value added to `1`.

Author here. I use xor for a few different reasons. In the `random_access` function, I use it in order to randomize the index at every iteration to defeat the prefetcher. I could use `rand` to randomize, or similar, but I wanted to make sure that the time spent randomizing the indices would be dwarfed by the cache misses (which honestly would probably still be the case), and xor + a shift is fast.

In the `linear_access` function, I don’t want to sabotage the prefetcher, but I want the function to have as similar assembly as `random_access`, so that when they are contrasted, the difference in performance is purely from whether the prefetcher is able to do its work or not.

I also want to defeat other optimizations, such as autovectorisation which is not introduced until later in the blog. I believe `aligment_test` and `foo` would both vectorise if I didn’t insert an xor.

More generally, the examples in the blog post had to be constructed a little carefully, because in each section, I introduce a single concept and want the performance of that function to explore that concept only. In reality, the performance of a function is impacted by many different aspects. So if you’re not careful, you end up with examples impacted by multiple different concepts, some of which has not yet been explained, and you would have to write

In this example, `foo` is faster than `bar` because it exploits concept X. Okay actually the output shows `bar` is faster, but this is due to a different memory dependency in the code that messes up instruction level parallelism in `foo`. Sure you don’t know what that means until the next chapter, but trust me bro, concept X matters a lot and `foo` ought to be faster even though it isn’t.

9 Likes

Thank you for the explanation. Did not expect an answer from the author himself 1 Like

Thanks for the explanation!