# Memory consumtion Rational{BigInt} on bernoulli number example

I got a little bored yesterday and tried some examples from rosettacode http://rosettacode.org/wiki/Bernoulli_numbers#Julia to test the performance against the Python code. Was kind of disappointing, Python worked better on the first run. Especially the memory consumption when measuring with @btime

I changed the code a bit and shortened the padding part for the formatted output.

function bernoulli(n,A)
for m = 0:n
A[m+1] = 1 // (m+1)
for j = m:-1:1
A[j] = j * (A[j] - A[j+1])
end
end
return A[1]
end

function display(n)
A = Vector{Rational{BigInt}}(undef, n + 1)
B=map(x->bernoulli(x,A), 0:n)
for (i, val) in enumerate(B)
(val.num != 0) && println("B(\$(i-1))", lpad(val.num,45+Int(i<10)), " / ", val.den)
end
end

@btime display(60)

B(0)                                             1 / 1
B(1)                                             1 / 2
B(2)                                             1 / 6
B(4)                                            -1 / 30
B(6)                                             1 / 42
B(8)                                            -1 / 30
B(10)                                            5 / 66
B(12)                                         -691 / 2730
B(14)                                            7 / 6
B(16)                                        -3617 / 510
B(18)                                        43867 / 798
B(20)                                      -174611 / 330
B(22)                                       854513 / 138
B(24)                                   -236364091 / 2730
B(26)                                      8553103 / 6
B(28)                                 -23749461029 / 870
B(30)                                8615841276005 / 14322
B(32)                               -7709321041217 / 510
B(34)                                2577687858367 / 6
B(36)                        -26315271553053477373 / 1919190
B(38)                             2929993913841559 / 6
B(40)                       -261082718496449122051 / 13530
B(42)                       1520097643918070802691 / 1806
B(44)                     -27833269579301024235023 / 690
B(46)                     596451111593912163277961 / 282
B(48)                -5609403368997817686249127547 / 46410
B(50)                  495057205241079648212477525 / 66
B(52)              -801165718135489957347924991853 / 1590
B(54)             29149963634884862421418123812691 / 798
B(56)          -2479392929313226753685415739663229 / 870
B(58)          84483613348880041862046775994036021 / 354
B(60) -1215233140483755572040304994079820246041491 / 56786730
-> 207.252 ms (2365816 allocations: 21.87 MiB)

How can I reduce memory consumption and increase speed?

best regards

Michael

BigInts don’t have a great performance in Julia. Have a look on BitIntegers.jl package. For me this works much better than BigInts (100x faster)

3 Likes

Tested this once and only get error messages like this

LLVM ERROR: Program used external function ‘__umodti3’ which could not be resolved!

best regards

Michael

Very interesting, could you share code that causes the error. It should work. What Julia version are you using?

Also, please mention how you installed Julia. Using the normal binaries or some distro provided installation?

Ahh false alarm . Everything ok. It was a mistake of mine with using @define_integers

best regards

Michael

1 Like

I tried using MutableArithmetics.jl, but while they support BigInt, they don’t yet support Rational{BigInt}. I tried to add some rudimentary support for Rational{BigInt} and managed to get much better performance:

130.712 ms (2307623 allocations: 43.46 MiB)
19.572 ms (101886 allocations: 1.92 MiB)

where the first is the original script, while the second is the modified one (the full code is below). However, I ended up still needing to use some low-level functions in the final script; maybe there is a better way to do it with MutableArithmetics? (cc @blegat).

It’s my first attempt at using the package, although I’ve wanted to for awhile, since it looks very cool.

Full code
using MutableArithmetics, BenchmarkTools

@eval MutableArithmetics begin
mutability(::Type{Rational{BigInt}}) = IsMutable()
mutable_copy(x::Rational{BigInt}) = deepcopy(x)

# zero
promote_operation(::typeof(zero), ::Type{Rational{BigInt}}) = Rational{BigInt}
mutable_operate!(::typeof(zero), x::Rational{BigInt}) = (mutable_operate!(zero, x.num); mutable_operate!(one, x.den); x)

# one
promote_operation(::typeof(one), ::Type{Rational{BigInt}}) = Rational{BigInt}
mutable_operate!(::typeof(one), x::Rational{BigInt}) = (mutable_operate!(one, x.num); mutable_operate!(one, x.den); x)

function mutable_operate_to!(output::BigInt, ::typeof(gcd), x::BigInt, y::BigInt)
Base.GMP.MPZ.gcd!(output, x, y)
end

function mutable_operate!(::typeof(gcd), x::BigInt, y::BigInt)
Base.GMP.MPZ.gcd!(x, y)
end

function mutable_operate_to!(output::BigInt, ::typeof(div), x::BigInt, y::BigInt)
Base.GMP.MPZ.tdiv_q!(output, x, y)
end

function mutable_operate!(::typeof(div), x::BigInt, y::BigInt)
Base.GMP.MPZ.tdiv_q!(x, y)
end
# +, -
promote_operation(::Union{typeof(+), typeof(-)}, ::Rational{BigInt}, ::Rational{BigInt})  = Rational{BigInt}

buffer_for(::Union{typeof(+), typeof(-)}, ::Rational{BigInt}, ::Rational{BigInt}) = BigInt()

function mutable_operate_to!(output::Rational{BigInt}, op::Function, x::Rational{BigInt}, y)
return mutable_buffered_operate_to!(buffer_for(op, x, y), output, op, x, y)
end

function mutable_buffered_operate_to!(buffer::BigInt, output::Rational{BigInt}, op::Union{typeof(+), typeof(-)}, x::Rational{BigInt}, y::Rational{BigInt})
# non-mutating algorithm:
# g = gcd(x.den, y.den)
# xd = div(x.den, g)
# yd = div(y.den, g)
# op(x.num * yd, y.num * xd) // (x.den * yd)

mutable_operate_to!(buffer, gcd, x.den, y.den) # buffer = g = gcd(x.den, y.den)
mutable_operate_to!(output.num, div, x.den, buffer) # output.num = xd = div(x.den, g)
mutable_operate_to!(output.den, div, y.den, buffer) # output.den = yd = div(y.den, g)

# `buffer` no longer needs to store `g`
# Now we need to store `xd`
Base.GMP.MPZ.set!(buffer, output.num)

mutable_operate_to!(output.num, *, x.num, output.den) # output.num = x.num * yd

mutable_operate!(*, buffer, y.num) # buffer = xd * y.num
mutable_operate!(op, output.num, buffer) # output.num = op(x.num * yd, xd * y.num)
mutable_operate!(*, output.den, x.den)

# Now `output` is complete, but isn't in lowest terms yet
mutable_operate_to!(buffer, gcd, output.num, output.den)
mutable_operate!(div, output.num, buffer)
mutable_operate!(div, output.den, buffer)

output
end

function mutable_operate!(op::Function, x::Rational{BigInt}, args::Vararg{Any, N}) where N
mutable_operate_to!(x, op, mutable_copy(x), args...)
end

buffer_for(::typeof(*), ::Rational{BigInt}, ::Rational{BigInt}) = (BigInt(), BigInt())

function mutable_buffered_operate_to!(buffer::Tuple{BigInt, BigInt}, output::Rational{BigInt}, ::typeof(*), x::Rational{BigInt}, y::Rational{BigInt})
# non-mutating algorithm:
# g1 = gcd(x.num, y.den)
# xn = div(x.num, g1)
# yd = div(y.den, g1)
# g2 = gcd(x.den, y.num)
# xd = div(x.den, g2)
# yn = div(y.num, g2)
# (xn * yn) // (xd * yd)

g1 = buffer[1]
mutable_operate_to!(g1, gcd, x.num, y.den) # g1 = gcd(x.num, y.den)
mutable_operate_to!(output.num, div, x.num, g1) # output.num = xn = div(x.num, g1)
mutable_operate_to!(output.den, div, y.den, g1) # output.den = yd = div(y.den, g1)

# Now g1 is no longer needed
g2 = buffer[1]
mutable_operate_to!(g2, gcd, x.den, y.num) # g2 = gcd(x.den, y.num)
yn = buffer[2]
mutable_operate_to!(yn, div, y.num, g2) # yn = div(y.num, g2)
mutable_operate!(*, output.num, yn) # output.num is finished; equals xn*yn

xd = buffer[2]
mutable_operate_to!(xd, div, x.den, g2) # xd = div(x.den, g2)
mutable_operate!(*, output.den, xd) # output.den = xd * yd

# Now `output` is complete, but isn't in lowest terms yet
g = buffer[1]
mutable_operate_to!(g, gcd, output.num, output.den)
mutable_operate!(div, output.num, g)
mutable_operate!(div, output.den, g)

output
end

buffer_for(::typeof(*), ::Rational{BigInt}, ::BigInt) = BigInt()

function mutable_buffered_operate_to!(buffer::BigInt, output::Rational{BigInt}, ::typeof(*), x::Rational{BigInt}, y::BigInt)
g = buffer
mutable_operate_to!(output.num, *, x.num, y)
Base.GMP.MPZ.set!(output.den, x.den)

# Now `output` is complete, but isn't in lowest terms yet
mutable_operate_to!(g, gcd, output.num, output.den)
mutable_operate!(div, output.num, g)
mutable_operate!(div, output.den, g)

output
end

end

using Test
@testset begin
a = big(5)//big(2)
b = big(3)
c = big(1)//big(1)

MutableArithmetics.mutable_operate_to!(c, *, a, b)
@test c == a*b

a = big(2) // big(3)
b = big(5) // big(7)

MutableArithmetics.mutable_operate_to!(c, -, a, b)
@test c == a - b
MutableArithmetics.mutable_operate_to!(c, +, a, b)
@test c == a + b
MutableArithmetics.mutable_operate_to!(c, *, a, b)
@test c == a * b

a = big(2)//big(5)
MutableArithmetics.one!(a)
@test a == big(1)//big(1)

MutableArithmetics.zero!(a)
@test a == big(0)//big(1)
end

function bernoulli(n,A)
for m = 0:n
A[m+1] = 1 // (m+1)
for j = m:-1:1
A[j] = j * (A[j] - A[j+1])
end
end
return A[1]
end

function bernoulli2(n,A)
buffer_int = big(1)
buffer_rat = big(1)//big(1)

for m = 0:n
A[m+1] = 1 // (m+1)
for j = m:-1:1
Base.GMP.MPZ.set!(buffer_rat.num, A[j].num)
Base.GMP.MPZ.set!(buffer_rat.den, A[j].den)
MutableArithmetics.mutable_buffered_operate_to!(buffer_int, A[j], -, buffer_rat, A[j+1])
Base.GMP.MPZ.set!(buffer_rat.num, A[j].num)
Base.GMP.MPZ.set!(buffer_rat.den, A[j].den)
MutableArithmetics.mutable_buffered_operate_to!(buffer_int, A[j], *, buffer_rat, big(j))
end
end
return A[1]
end

function bernoulli_display(n, b; do_print = false)
A = Vector{Rational{BigInt}}(undef, n + 1)
B=map(x->b(x,A), 0:n)
for (i, val) in enumerate(B)
(val.num != 0) && do_print && println("B(\$(i-1))", lpad(val.num,45+Int(i<10)), " / ", val.den)
end
end

@btime bernoulli_display(60, bernoulli)
@btime bernoulli_display(60, bernoulli2)
## set `do_print = true` to confirm they agree
# bernoulli_display(60, bernoulli; do_print = true)
# bernoulli_display(60, bernoulli2; do_print = true)

edit: fixed a mistake in the code

1 Like

IN:

using IntegerSequences, BenchmarkTools

println.(BernoulliList(9))

@btime BernoulliList(10)
@btime BernoulliList(60)
@btime BernoulliList(1000)

OUT:

1
1//2
1//6
0
-1//30
0
1//42
0
-1//30

7.456 μs (107 allocations: 3.84 KiB)
685.911 μs (2122 allocations: 41.08 KiB)
587.003 ms (505265 allocations: 7.82 MiB)

Hi @ericphanson
I have tested your code and my results are as follows

187.566 ms (2364864 allocations: 21.85 MiB)
40.846 ms (104149 allocations: 982.47 KiB)

This is already a big difference between the two results .

1 Like

Glad to hear it! There’s now an issue open on MutableArithmetics.jl to add support for rationals, so hopefully this will be easier at some point in the future. Ideally at some point one would only need to add MutableArithmetic’s @rewrite macro on the line which updates A[j], although I’m not sure that can ever get as fast as reusing the buffers over the course of the loop as I did in my solution. On the other hand there might be other tricks that can be incorporated to speed it up; I don’t know :).

just discovered another problem. Wanted to do some timing outside of Atom Juno in the bash and noticed a big time delay. If I exclude MutableArithmetics and run the old code, the delay does not exist. The second measurement does not get better…

time julia Bernoulli.jl

real	0m12.097s
user	0m11.760s
sys	0m0.416s

Could you take a measurement at your place for comparison?

best regards
Michael

Hi @mwolff, if I modify my script to not do the benchmarking and just do bernoulli_display(60, bernoulli2; do_print = true), I get

ephwork:BernoulliBigInt eh540\$ time julia --project=. mut-arith.jl
Test Summary: | Pass  Total
test set      |    6      6
B(0)                                             1 / 1
B(1)                                             1 / 2
B(2)                                             1 / 6
B(4)                                            -1 / 30
B(6)                                             1 / 42
B(8)                                            -1 / 30
B(10)                                            5 / 66
B(12)                                         -691 / 2730
B(14)                                            7 / 6
B(16)                                        -3617 / 510
B(18)                                        43867 / 798
B(20)                                      -174611 / 330
B(22)                                       854513 / 138
B(24)                                   -236364091 / 2730
B(26)                                      8553103 / 6
B(28)                                 -23749461029 / 870
B(30)                                8615841276005 / 14322
B(32)                               -7709321041217 / 510
B(34)                                2577687858367 / 6
B(36)                        -26315271553053477373 / 1919190
B(38)                             2929993913841559 / 6
B(40)                       -261082718496449122051 / 13530
B(42)                       1520097643918070802691 / 1806
B(44)                     -27833269579301024235023 / 690
B(46)                     596451111593912163277961 / 282
B(48)                -5609403368997817686249127547 / 46410
B(50)                  495057205241079648212477525 / 66
B(52)              -801165718135489957347924991853 / 1590
B(54)             29149963634884862421418123812691 / 798
B(56)          -2479392929313226753685415739663229 / 870
B(58)          84483613348880041862046775994036021 / 354
B(60) -1215233140483755572040304994079820246041491 / 56786730

real    0m10.392s
user    0m7.491s
sys     0m0.592s

I agree that’s a bit long. One thing that might help is that when the rational stuff gets put in MutableArithmetics.jl proper, it will be precompiled along with the rest of the package, which should help speed things up on the first run (after the package is precompiled, that is). The other thing is that compiler latency is a general problem that’s being addressed on several fronts (search discourse for the “time to first plot problem”, for example). For this reason, usually Julia is not used the way e.g. Python is, by writing scripts and then calling them from the command line, but instead is used by keeping a Julia session open for a long time (even days), and e.g. include(...)ing files into the session (edit: or using Juno or similar). See also the workflow tips in the manual, especially about using Revise.jl.

While hopefully the compiler can continue to get faster (e.g. Julia 1.3 is already a big improvement over Julia 1.0 for me in many cases), the compiler latency might never go away completely, because it’s inherent to how Julia attains its runtime speed that it compiles methods before they are used the first time (currently, the first time they are used in a given session, unless they are in a package and can get precompiled or you compile them into your sysimg using e.g. PackageCompilerX.jl).

Ok and I thought this was a problem with my system or a setting error, because everything goes faster in Atom Juno.
I already tried PackageCompilerX, but it takes a long time to create a binary on my system. In that time I could jog to the village and back.
It’s a mystery to me how the code with luajit on rosettacode only needs 0.022s, according to their timing

Ok and I thought this was a problem with my system or a setting error, because everything goes faster in Atom Juno.

Probably in Juno you are using the same session like I described in my post?

I already tried PackageCompilerX, but it takes a long time to create a binary on my system. In that time I could jog to the village and back.

Yeah, I think PackageCompilerX isn’t a general solution but rather is for specific situations where the tradeoffs are worth it.

It’s a mystery to me how the code with luajit on rosettacode only needs 0.022s, according to their timing

I’ve no idea how luajit works, but we do achieve that timing in my optimized version on my laptop when benchmarking (i.e. excluding compilation times). In fact, we are probably doing the same things i.e. using low-level GMP methods to reuse memory. My hope is that with MutableArithmetics, however, soon the Julia version can achieve that speed while being much more readable. I think that’s a worthy tradeoff compared to speed when called from the command line (where luajit has the clear advantage in this case), but of course it all depends on what you need it for.

I hope it gets better sometime with the latencies. For interest I had the code compiled with PackageCompilerX and came up with the following values in Bash with time

real	0m0.461s
user	0m0.400s
sys	0m0.156s

best regards
Michael

1 Like