Why is this code slow?

CF PSA: Microbenchmarks remember branch history

It is even worse here, since your cdf does not fit into L1. You can probably get a significant speedup by using a cache-oblivious layout of the implicit search tree, instead of a linearly ordered vector.

Unfortunately, I am not aware of any julia packages implementing search-optimized layout of sorted lists.

2 Likes