High-Performance, Lane-Based Segmented Sieve in Julia

Claugo · June 29, 2026, 5:47pm

Hi everyone
I wanted to share a program I’ve been working on within my GC60 project, which focuses on high-magnitude prime searching (beyond the 64-bit barrier). The goal wasn’t just to write a fast sieve, but to explore how a specialized mathematical structure, a Modulo-30 residue system organized into 8 computational lanes, interacts with Julia’s high-level capabilities.

The Challenge:
How can we implement a non-linear, lane-based sieve in a high-level language without being crushed by the overhead of memory management and thread synchronization?

The Implementation:
To achieve near-native performance, I focused on three specific Julia optimizations:

Memory Density: Using BitVector to align segment sizes with CPU L1/L2 cache.
GC Avoidance: Implementing all “Prep” and “Mapping” phases as in-place operations to eliminate mid-computation allocations. Load Balancing: Using Atomic{Int} for a dynamic work-stealing pattern to keep all threads busy despite the irregular distribution of primes across the 8 lanes.

The Results:
In a direct comparison of the core logic:

Primesieve (Kim Walisch):        ~1.911s
M30x8 (Julia Specialized Logic): ~1.046s

Note: While a direct C++ implementation of the M30x8 architecture is faster (~0.6s) due to hardware-level SIMD/AVX2 optimizations, this Julia implementation still significantly outperforms the industry-standard Primesieve (1.911s). This highlights that strategic algorithmic design can often outweigh raw language-level speed in high-performance computing tasks.

Verification:
All results were rigorously validated using SymPy in Python to ensure 100% mathematical correctness.

The full implementation is available on GitHub. The README includes benchmark specifications and timing results up to 10^24, feel free to download it, run it on your own machine, and see for yourself.

Repo: GitHub - Claugo/GC60_M30x8_suffix: High-Performance, Lane-Based Segmented Sieve in Julia · GitHub

Oscar_Smith · June 29, 2026, 6:21pm

That’s a really smart approach! I was just working on getting a segmented sieve for Primes.jl (Segmented sieve for `primes`, add `eachprime` iterator. - Pull Request #120 - JuliaMath/Primes.jl - GitHub), but this approach seems much nicer. Would you consider adding it as a PR to Primes (or any objections if I do)?

One other thing worth noting is that your ranges for the benchmarks are pretty small. For such a small range, rather than sieving you’re a lot better off by doing a basic primality test per number.

julia> using Primes
julia> function num_primes(range)
           num = 0
           for i in range
               isprime(i) && (num += 1)
           end
           num
       end
num_primes (generic function with 1 method)

julia> @time num_primes(Int128(10)^19:Int128(10)^19+10^5)
  0.022324 seconds (308.50 k allocations: 12.371 MiB) # vs 1s for the sieve
2263

langestefan · June 29, 2026, 6:29pm

well done!

out of curiosity what is this? (I can’t find it online)

Claugo · June 29, 2026, 6:41pm

Of course you can do that; I’d really appreciate it

Claugo · June 29, 2026, 6:49pm

This GC-60 project was published on Zenodo records 18657440. From there, algorithms were developed capable of reaching 10^27 in just over 3 hours, such as GC60_M30x3_suffix, which precedes X8. This GC60_M30x8_suffix has a slightly different concept and spans 8 columns, which is why I chose to implement it in Julia, as it seemed well-suited to managing the 8 threads very efficiently.

Oscar_Smith · June 30, 2026, 9:36pm

Segmented sieve for `primes`, add `eachprime` iterator. - Pull Request #120 - JuliaMath/Primes.jl - GitHub here it is. It ended up being a pretty nice speed increase (~30% faster than the previous version of the segmented sieve I had for no extra code)

Claugo · July 1, 2026, 10:27am

Thanks Oscar! The test data is outstanding, especially on smaller numbers. I’m really glad the idea turned out to be useful for the Primes package. Good luck with the future development!

Topic		Replies	Views
LRLN Sieve: High-magnitude prime screening in 54ms using Julia General Usage	1	198	July 10, 2026
A humble request/challenge Optimization (Mathematical)	5	1270	August 8, 2017
An Adventurous Juila Challenge Performance	0	766	December 13, 2018
Fast Generation of Arrays of Prime Numbers Offtopic	32	1726	October 4, 2023
Improving performance in checking prime numbers General Usage	14	2047	March 2, 2021

High-Performance, Lane-Based Segmented Sieve in Julia

Related topics