Threads are not speeding up computation a lot

Benny · February 27, 2025, 2:03pm

What do you think was widely represented inaccurately so far? The worse performance of BigInt was attributed to its allocations, which the issue you linked talks about. Speedup was stated to be sublinear even in ideal circumstances, and you corroborated that with your multithreading benchmark for BigInt and the non-allocated Int. On the facts, you’re just agreeing with us.

This is misleading. The difference is ~27x on my machine (between 1 and 2 orders of magnitude) even when the GC doesn’t run, and it’s accounted for by iszero checking the BigInt’s size field rather than directly comparing its more complicated value to another integer it has to promote to another BigInt first. The BigInt instantiation takes most of the time, so the difference drops to ~2.4x if you preallocate BigInt(0) prior to benchmarking ==. Not really sure how that can improve further because that just ccalls a GMP function.

yolhan_mannes · February 27, 2025, 2:13pm

I tried Int256 it didn’t fit but uint256 did, 512 bits integer slow down a lot when I tried, but if you have a high enough memory bandwidth thread could still work nicely. you won’t be able to see stack pressure with julia time macro but there may be tools to see it

Nyan · February 27, 2025, 2:20pm

Agree. One order of magnitude.

Not really. Jit or whatever must see that this is constant, and that it is 0.
And choosing faster code path is obvious thing to do.

And I already got the relief that I am done with benchmarking.
There is my code (maybe I did stupid thing here):


function t1(a)
    for i in 1:100000000
        if a == 0
            return 1
        end
    end
    return 0
end

function t2(a)
    BZ = BigInt(0)
    for i in 1:100000000
        if a == BZ
            return 1
        end
    end
    return 0
end

function t3(a)
    for i in 1:100000000
        if iszero(a)
            return 1
        end
    end
    return 0
end

a = BigInt(1234)

t1(a)
t2(a)
t3(a)

@time t1(a)
@time t2(a)
@time t3(a)

Results are:

0.195189 seconds
0.294741 seconds (2 allocations: 40 bytes)
0.000006 seconds

It is in no way or form solves the problem of basically no scaling and not proposes any ways or forms to fix this.
Is my code wrong? Then what can fix it?
Is Julia just have slow, numbers?
Yes, a lot of allocation is happening, and there is some scaling (laws) that I presumably do not understand.
So maybe I misunderstood people, and they not tried to avoid telling me that Julia BigInts is insanely inefficient and should be avoided in best corporate HR traditions and just wanted to tel me about general computation principles because it is breath taking interesting topic.

And I got extremely frustrated by benchmarking two days instead of doing my research.
So then I am sorry. I gladly discuss this topics in different topic.

mbauman · February 27, 2025, 2:20pm

That’s what we ask of everyone here. Nobody is gaslighting you; lots of folks have participated in this thread in good faith, freely offering their thoughts and advice.

What is offensive is comparing this to war crimes and genocide. Let’s not do that here, please.

If you have further questions, please don’t hesitate to DM me or the other moderators.

Nyan · February 27, 2025, 2:24pm

Understood. I edit the post. I hope it is more acceptable now. Thank you.

Nyan · February 27, 2025, 2:36pm

You are talking about Julia code here? I don’t now how much caching can help there, considered that BigInts is immutable anyway.
At this stage, I just want to fill matrix with them, so I maybe even use some c-lib for this (if it will be not much pain)
But then. I still need to do matrix multiplication of this huge BigInt matrix and similar stuff. So probably BigInt is not the option for me here.

But if about internal realization. I think Julia needs to allocate this at stack if possible, and in heap only if referenced by array or something.
I also didn’t see what is happening with gmp lib calls. But I just assume that some unnecessary synchronization happening there.

yolhan_mannes · February 27, 2025, 2:49pm

Ah yeah you’re right, I don’t know how to make it look nice because the first thing I tried with your module was to make sure function’s return are always of same types and I changed every return 0,0,1 to return big(0),big(0),big(1) but because of BigInt allocating when created it was worse and that not the kind of things I like to cache but it would be the only way to avoid BitInteger, caching those Ints that are converted after to Big at every iteration or worse that makes your function return Union{Int,BigInt} which killed your first program and created those gb allocations, you could createconst myzerobig = big(0) and const myonebig = big(1) but its ugly. It was the case of the _inv function which utimatly returned a Union and lead to every other call being called on Integer abtract type and that hurted so bad

adienes · February 27, 2025, 2:59pm

I (think) using the @big_str macro a la big"0", big"1" instead of big(0) and big(1) will let these share instances instead of reallocating every time

I tried to document this once add docs for big_str instance sharing by adienes · Pull Request #50942 · JuliaLang/julia · GitHub but it wasn’t super well received

julia> foo(x) = big(1) + x
foo (generic function with 1 method)

julia> bar(x) = big"1" + x
bar (generic function with 1 method)

julia> baz(x) = Base.GMP.MPZ.add!(x, big"1")
baz (generic function with 1 method)



julia> @btime foo(x) setup=x=rand(Int)
  38.771 ns (4 allocations: 80 bytes)
-4368964967845838583

julia> @btime bar(x) setup=x=rand(Int)
  19.559 ns (2 allocations: 40 bytes)
807978620266249972

julia> @btime baz(x) setup=x=big(rand(Int))
  6.750 ns (0 allocations: 0 bytes)
-5464924412310845126

yolhan_mannes · February 27, 2025, 3:00pm

wait this exists ? what’s the diff with BitInteger ? If someone can take the code I left (the one with the thread/time plot) and change all uint256"…" to @big_str big"…" just to see I would be greatfull

yolhan_mannes · February 27, 2025, 3:05pm

can you make a rand(BigInt) instead to avoid conversion in +
edit : nevermind doesn’t change anything, 2 less alloc is nice, still the result of + will be allocated.
edit2 : Base.GMP.MPZ.add! seems the way to go, I thought BigInt was immutable ?

adienes · February 27, 2025, 3:09pm

I think it will still allocate because it has to create a new bigint for the returned sum (unless you do the addition in-place with nonpublic functions). the @big_str trick only works for literal values and I don’t believe you can pass in a variable

mbauman · February 27, 2025, 3:09pm

Yes, BigInt is immutable on the Julia side (fitting with Julia’s other Numbers). The library it uses — GMP — is itself quite optimized but for mutation. That’s the fundamental mismatch here.

Behind the scenes it possible to mutate, but doing so without very careful consideration can lead to some pretty big surprises.

yolhan_mannes · February 27, 2025, 3:10pm

So GMP is unsafe ? mutating an immutable is weird

Benny · February 27, 2025, 3:10pm

Julia’s compiler does do constant work ahead of time, but there are still things that can reasonably get in its way. In this case, the constant part involves a heap allocation for an arbitrary precision value. Worst case, it preallocates something so Big it uses up your memory, and you’re a few calls away from a crash if you didn’t already. Even if the compiler is made smart enough to make sure the preallocated size doesn’t get too big, that’ll scale poorly with your codebase. Preallocation is thus usually left up to the user, you could use a const Bigzero = BigInt(0) in your code if you need to.

Not stupid, in fact it’s much better than other first attempts because you tried multiple runs to account for timing resolution. But throwing things in a loop generally risks the compiler optimizing away things you need for benchmarking; worst case you don’t have a loop at all. BenchmarkTools.jl and correct usage of its $-interpolation is how you keep the things you intend to benchmark.

You got things a little backwards. BigInt-s don’t have a mutation API, but it’s implemented by a mutable type, check ismutabletype. GMP’s arbitrary precision integers, which Julia wraps, allocates on the heap to begin with because arbitrary precision can’t have a fixed size. Julia’s mutable type wrapper does add a small allocation, but that’s the lesser issue.

That’s why FLINT’s ability to save allocations for smaller values was mentioned, but as far as I know, the Julia package ecosystem that wraps it uses mutable types to reflect the FLINT API (much like GMP’s API), which adds a small allocation for memory safety. Like the issues you linked mention, a similar Julia type could avoid heap allocations for smaller values, but that’ll take work, possibly on the language core as well.

Not much. With every integer allocating data on the heap, you’re running into hardware limits that every language has to deal with. Even if we optimize smaller values with stack allocation, Amdahl’s law is a hard mathematical limit. If you usually benchmark embarrassingly parallel programs and fewer cores, speedup might appear linear, but that’s just a portion of the nonlinear curve. Nobody can defeat math.

adienes · February 27, 2025, 3:11pm

not “unsafe” afaik in that your program will be corrupted but definitely is unsafe in the sense that you should be very careful or you’ll probably get some funny results

mbauman · February 27, 2025, 3:11pm

No, in the eyes of GMP it’s definitely a mutable interface. And GMP makes no such assumptions about it being immutable.

The trouble is that Julia assumes its Numbers won’t change. It’s a fundamental mismatch in assumptions between the two softwares.

yolhan_mannes · February 27, 2025, 3:17pm

I see, i would still go with

julia> mutable struct MBInt
           bi ::BigInt
       end

julia> myadd!(a::MBInt,b::MBInt) = Base.GMP.MPZ.add!(a.bi,b.bi)
myadd! (generic function with 2 methods)

julia> a = MBInt(1)
MBInt(1)

julia> @btime myadd!($a,$a);
  1.779 μs (0 allocations: 0 bytes)

just to be sure ( I know it doesn’t need to be mutable but …)

Palli · February 27, 2025, 3:22pm

Yes, allocating (or its implied free/GC rather), harms all thread scaling, not just BigInt. No more or less though, I believe, for it than for other things like e.g. Strings.

No.

julia> @btime BigInt(1) == 0;
  61.369 ns (2 allocations: 40 bytes)

julia> @btime iszero(BigInt(1));
  58.754 ns (2 allocations: 40 bytes)

At most 4% difference, and I think I’m actually measuring a difference, not just random fluctuation, so ok, why? Note, in case if you meant with unary minus, then not comparable:

julia> @btime -BigInt(1) == 0;
  120.575 ns (4 allocations: 80 bytes)

julia> @btime iszero(-BigInt(1));
  119.961 ns (4 allocations: 80 bytes)

Note also you were not comparing comparisons, mostly, but rather BigInt function, i.e. the constructor, and its necessary(?) allocation. With the big macro, constructing at compile time:

julia> c = big"0";

julia> @btime c == 0;
  23.781 ns (0 allocations: 0 bytes)

julia> @btime iszero(c);
  19.693 ns (0 allocations: 0 bytes)

Now you get (understandably) a larger, 20% relative difference (and about 2.5x faster for each, because you’re not measuring the comparatively slow allocation).

julia> @code_lowered c == 0
CodeInfo(
1 ─ %1 = Base.GMP.cmp(x, i)
│   %2 = %1 == 0
└──      return %2
)

julia> @code_lowered iszero(c)
CodeInfo(
1 ─ %1 = Base.getproperty(x, :size)
│   %2 = %1 == 0
└──      return %2
)

julia> @code_lowered c == big"0"
CodeInfo(
1 ─ %1 = Base.GMP.cmp(x, y)
│   %2 = %1 == 0
└──      return %2
)

The first and last are very similar, so you might think the same assembly, only i → y change… but it’s not the same for (why not?):

julia> @code_native c == big"0"

already rather good, and even better with (probably optimal):

julia> @code_native iszero(c)

Benny · February 27, 2025, 3:27pm

Does it go far enough that the assumption can’t feasibly be removed by more specific, possibly Holy trait-ed, methods? For example, BigInt-s use GMP’s comparisons for ==, not Number’s fallback forwarding to ===. There were packages attempting mutable Number subtypes, at the very least.

The absence of $-interpolation is throwing your benchmark off.

yolhan_mannes · February 27, 2025, 3:28pm

People that come here for parallel and realize they don’t understand Intergers (I’m one of them now) just kidding it’s actually really weird how they work in julia. BitInteger seems much nicer but the pressure it puts on the stack makes me understand why BigInt were built like this

Topic		Replies	Views
How to speed up tasks? Performance	8	801	September 23, 2020
Threads maxing out all cores, but no performance increase General Usage performance , threads	16	1827	April 6, 2021
Speed up for-loop with multithreading Performance question , multithreading	10	1835	April 22, 2022
Slower @threads than serial for array computations Julia at Scale	26	2686	May 7, 2020
Slower with threads Performance question	26	1168	August 6, 2022

Threads are not speeding up computation a lot

Related topics