Threads are not speeding up computation a lot

That 4.4x is nice, though not the ideal 20x. But why do you have any allocations at all? And why more with threading? It could be your limiter. I’m not sure but Bumper.jl could help, I still though with you switching to a bittype, it wouldn’t be needed, already deallocated early. Then you can allocate in a hot loop and it will get freed immediately, stack-like, likely why Python is fast with its RC-GC, but should be even faster with Bumper. Bumper seems to support threads, but I’ve not looked closely at it. BigInt and BigFloat are performance killers, both have better alternatives (in most cases, but since they’re there, they’re the go-to for many people), as you’ve discovered, why I want them gone from standard Julia…

@yolhan_mannes, this is a bit strange:

> julia -t 1 --project main.jl
  0.304315 seconds (10 allocations: 6.104 MiB, 11.04% gc time)

That GC time, it doesn’t happen with 20 threads. And is also bependent on other, when also with just 1 thread and same about of constant overhead 10 allocations.

[A recent change added 1 interactive thread as the new default, but for current versions, you may want to get in the habit of -t 20,1 or -t auto,1. I doubt though it matters in this case.]

2 Likes