SIMD: Need some help to speed up sampling a code vector

I like the fixed point number implementation better, its how hardware NCOs work on hardware GPS receiver

One option is to vectorize the fixed point index calculation and table lookup using gather instructions:

We can do vectorized load from a lookup table if we can somehow convince Julia to emit vgatherdpd instruction: vgatherdps . Whether this is faster than a indexing loop on a CPU is debatable:

The paper shows that its profitable for an i9-7900X processor with AVX512:

(reg_standalone is scalar indexing loop)

But a security update might make this fast vectorized lookup table code go 50% slower:

Another option is to run the code LFSRs in parallel instead of indexing into a lookup table:

I haven’t seen anyone doing this for GNSS PRN generators though

2 Likes