Tonight I hated julia, REPL and overflows

…and we lost him :wink:

5 Likes

What I meant is that if you are running a single piece of code multitple times and that line of code either needs or not need BigInt, then that line of code will need BigInt way more often than not. This is simply because there are way more BigInt than Int. If you are running a piece of code only once (not in a loop and not with different inputs), then the performance of it doesn’t matter.

Of course it’s entirely possible that only a small portion of your code needs BigInt and you should limit your use of BigInt to those few lines of code.

2 Likes

Normally, big numbers appears in divisions (by another big number) so an alternative approach would be to contain those divisions inside functions.

This is not the way it goes. A typical example I had recently is when compute the size of a set, about 10^9 intermediate computations did fit with in Int64 or Int128 and only about 30 did not fit. I have the choice to redo entirely the computation in BigInt (very slow) or (better) to use SaferIntegers but then it is quite complicated even with a try block to switch to BigInt to go over an overflow case and back to SafeInt when possible…

So you are saying you repeated the computation 10^9 times?

Maybe try BitIntegers.jl, they work well for me, much faster than bigints

No. I said that the computation demanded 10^9 intermediate computations, which were all different, and that about 30 of them overflowed Int128. And that this is fairly typical of the kind of computations that I have to do in representation theory of Hecke algebras, one of my topics.

I did a quick test of Int256. In this test it is slower that BigInt, probably because of divisions…

function test_hm(rtype,rank)
  hm(rank)=[rtype(1)//(n+m) for n in 1:rank, m in 1:rank]
  m=hm(rank)
  one(m)==m*inv(m)
end
julia> @btime test_hm(Int256,45)
  3.213 s (56720623 allocations: 1.12 GiB)
true

julia> @btime test_hm(BigInt,45)
  703.989 ms (10933914 allocations: 210.82 MiB)
true
1 Like

And that’s not what I was saying. 30 out of 10^9 doesn’t matter. The question is how many expressions in your code overflows. Only repeated calculation that can overflow matters.

Well, actually, it is a single one (complicated) expression which overflows (about 30 times in 10^9).
The values of the variables in the expression are different each time and I cannot predict which values
will make the expression overflow.

Errrr, a complicated expression is multiple expressions.

And in any case, what’s special about typemax(Int128) in your calculations for it to be a effective upper bound of those numbers or what’s the distribution of the numbers involved.

2^128 is just the largest number for which integer arithmetic is fast. The majority of numbers (perhaps 99%) actually fit in Int32. Perhaps 1% do not fit in Int32, and 0.001% do not fit in Int64. This kind of Poisson law is typical of what happens in my computations…

If this kind of distribution is what you have then it’s better to just detect the error and redo the ones with higher precision. In this case you still pretty much want to optimize only for the cases that fits instead of trying to make every operation safe and slower.

1 Like

detect the error and redo with higher precision is exactly what the union type we are discussing can do automatically for you, saving you a lot of difficult programming

1 Like

It would be great if someone who has a need for this would create a package for this kind of approach as an experiment.

Obviously there would be a performance penalty compared to Int, but I am not sure how big that would be with modern CPUs if the BigInt branch is actually rarely taken. It may just be a tolerable trade-off for some applications, especially compared to always using BigInt, but we won’t know until it is tried.

Look at the Nemo package, Nemo.fmpz is a wrapper over flint big integers, which uses GMP for actual big numbers, but otherwise stores an small integer inline. Unfortunately Nemo.fmpz is not currently a subtype of Integer, so this can limit its applicability as a replacement for BigInt.

2 Likes

No not for each operation, for the whole complex expression.

this situation actually begs for a julian Flint.jl which everybody would benefit from.
But I’m not going to open this can of worms :wink:

Indeed, (after 2 definitions which seem missing in Nemo):

julia> Base.one(::Type{fmpq})=fmpq(1)
julia> Base.:/(a::fmpq,b::fmpq)=a//b
julia> @btime test_hm(Nemo.fmpz,45)
  79.831 ms (466004 allocations: 15.86 MiB)
true

which is 10 times faster than BigInt!

3 Likes