Threads are not speeding up computation a lot

joeldahne · February 26, 2025, 2:14am

To iterate on Benny’s comment about Amdahl’s law. In your benchmarks for test_gen_keys2 the runtime with 1 thread is roughly 30 seconds with a 15% gc time. Assuming the gc is not parallelized at all and the rest is perfectly parallelized, then 4 threads would give a theoretical time of 30 * 0.15 + 30 * 0.85 / 4 = 10.875 seconds, which is not too far of from the 13 seconds you get. Similarly 16 threads would give a theoretical time of 30 * 0.15 + 30 * 0.85 / 16 = 6.09375, which again is not too far of from your 9.5 seconds.

One alternative to BitIntegers.jl could be to use BigInt together MutableArithmetic.jl to reduce the number allocations, that would however likely require rather intrusive changes to the code.

The fmpz type in the FLINT library works like this. I do however not think this can be fully leveraged in Julia. it would likely require some sort of finalizer to check if an allocation was made or not, and finalizers can only be attached to mutable types and those are tracked by the gc (in principle one could possibly insert the finalizers inline with some escape analysis, I think Julia can do something like that but I don’t know how flexible it is).

Topic		Replies	Views
Poor performance while multithreading (Julia 1.0) Performance multithreading	28	4036	February 11, 2019
Default sin / cos functions do not scale on multithreading Performance parallel , multithreading	19	1616	July 13, 2018
Poor performance on cluster multithreading Performance performance , parallel , multithreading , cluster	40	4239	July 11, 2018
Thread overhead variability across machines Performance	13	1891	November 28, 2017
Wrong results in calculation Performance	21	3290	March 12, 2019

Threads are not speeding up computation a lot

Related topics