[YouTube/GitHub] What is the FASTEST Computer Language? 45 Languages Tested

A pretty cool project which includes Julia:


The Julia code he’s running is here

He spells out the rules for the code in episode 2. Single threaded and prefers result as a bit vector. Algorithm faithful to the C algorithm.

SIMD would be fair game, though.


Just a little update, we seem to have three solutions now for Julia, and it would be great if anyone else could come and take a look, maybe help improve the current solutions! Let’s see how close Julia can get to the current kings (probably C, surprisingly Zig)

Full disclosure: I am the “author” of solution_3, which uses @simd, bitwise operations, and other tricks similar to PrimeC/sieve_1of2.c, which I based that solution on.

Before solution_1 was updated, here is a small snippet from a benchmark run on my machine (Intel Core i5-9300H, 24GB RAM, Docker under Ubuntu 20.04 under WSL under Windows 10 Home):

After solution_1 was updated, it seems to perform a bit slower than solution_2 (I unfortunately do not have a full benchmark run of this yet):

Primes/PrimeJulia/solution_1 on  drag-race via ஃ v1.6.2 took 5s
❯ julia PrimeSieveJulia.jl
Passes: 5744, Time: 5.000568866729736, Avg: 0.0008705725742913887, Limit: 1000000, Count: 78498, Valid: true


Another snippet of my benchmark runs (filtered for base-faithful-1bit solutions):

1 Like

On a related note, in this inner loop in solution_3:

    @simd for index in _div2(factor * factor):factor:max_index
        unsafe_set_bit_at_index!(arr, index)

It seems that @simd can actually make the code slower on older machines (such as an older Windows 10 PC or the Raspberry Pi). Could anyone explain why that is?

How significant is compile-time for the Julia results? Probably compile-time is included for Julia, but not for C, Rust, etc.
To make a fair comparison, the Julia code should be called once before the measurement starts. Alternatively (if this is not allowed), a sysimage could be created in advance.

Is it possible to speed up calculation further with LoopVectorization.jl?


I can’t speak for the other solutions, but I tried to take compile-time into account for the Julia results by calling the run_sieve! function once before the main timed code, but I might not have fully accounted for it.

I did try to create a custom sysimage and try package precompilation, but the sysimage took forever to create and it didn’t seem to affect the results much. Package precompilation also seemed to have little effect, maybe just making the results a bit more consistent? I’m not sure.

Is it possible to speed up calculation further with LoopVectorization.jl?

Maybe, but one of the rules is that there should be no external dependencies to actually calculate the sieve. Plus, bit arrays need to make sure that no two chunks are accessed at the same time. I think you can implement that, though, by checking that the step in the factor clearing loop is greater than one chunk.

Pull requests are really welcome in this regard.


Actually, could StaticArrays help the performance here, as well as LoopVectorization?

1 Like