CF PSA: Microbenchmarks remember branch history
It is even worse here, since your cdf does not fit into L1. You can probably get a significant speedup by using a cache-oblivious layout of the implicit search tree, instead of a linearly ordered vector.
Unfortunately, I am not aware of any julia packages implementing search-optimized layout of sorted lists.