Julia messes up integer exponents?

The point is that our current Rational implementation does not profit from eliding overflow checking, because every operation involves a gcd already. That is more expensive than a well-predicted branch, by orders of magnitude.

If you want a faster Rational, then you would go the opposite route: Instead of eagerly and automatically reducing, r = reduce(r) would be a thing.

Depending on whether you want your footgun in big Bertha-size, you could then introduce overflow checking; and in case of overflow, you reduce and then attempt the operation again; in case that you have a genuine overflow, the code can decide whether to throw or silently return weird results (that implement the very weird Rational{Int64} semantics).

Also, you would probably go for Int128, in order to have fewer gcd computations. That would probably even be faster.

2 Likes

I am not sure, but it seems, that this question is only falsifiable by one example but not verifiable.

any single operation

could be

(Int)(6//3)

which is not checked for overflows, I guess.

There is a third option… using an intermediate factorization form within the Rational type.

For example, when you multiply many rational numbers (a1//b1)*(a2//b2)*...*(an//bn), then you have to simplify (a1*a2*...*an)//(b1*b2*...*bn) at some point. If you want to make this more efficient, you could separately store each of the new factors [a1,a2,...,an] and [b1,b2,...,bn], and then later when you are done with the multiplication (and have accumulated all your factors) you call a reduce fraction method, which will then look for easy factors to cancel out. This way, you don’t have to take the gcd of very large numbers anymore, but you can work with smaller numbers too, thus requiring less precision ~

This would require an intermediate Rational form which stores multiple factors for the numerator and denominator, instead of multiplying them out into a large number.

If you know ahead of time how many multiplications you are making, then you should be able to pre-allocate a static array with the right number of factors ahead of time.

Naively, this entails keeping a variable-sized vector of numbers which would make this very hard to optimize so any code working with such a type would be at interpreted language speeds rather than the “primitive values in registers” speed that Julia’s known for. If there was a way to make this fixed size and keep the numbers in registers, this could be a promising basis for a fast rational type, especially if one could leverage vector operations to implement operations on such a rational number representation. That would be pretty fancy stuff though.

1 Like

As I mentioned at the end here: theoretically possible, but it would probably be a separate package

EDIT: I’ve got an idea for how it might work, I will make a prototype package to explain the details

Well, to answer my own question, it seems to me that the code of Rationals checks for overflow, excepted
they missed at least one corner case I found:

julia> u=1//typemin(Int)
-1//-9223372036854775808

julia> -u
-1//-9223372036854775808
2 Likes

@Jean_Michel So would you make a PR to fix or document that last corner case, document that Rational is overflow-safe, and one half-sentence that this is not a performance issue because overflow checking is cheaper than a single non-simd integer division anyway? (and gcd automatically precludes SIMD)

That way this entire meandering discussion would have at least one tangible positive outcome :wink:

6 Likes

I do not know yet how to make a PR (I learned git recently just in order to be able to make a Julia package) but I will make an issue. And I cannot vouch that this corner case is the last.

So this is not intended to be taken very seriously, but here is a fun experiment for a Quotient{T} type

https://github.com/chakravala/QuotientFields.jl/blob/master/src/QuotientFields.jl

This implementation is not the most efficient, and could probably be optimized further, here is a test:

julia> using QuotientFields

julia> N = [Quotient(1,n) for n ∈ 1:50];

julia> n = [1//big(n) for n ∈ 1:50];

For + the Quotient{Int} and Rational{BigInt} are not so comparable (it is slowed by qprod)

julia> @time +(N...)
  0.230195 seconds (1.87 M allocations: 32.590 MiB, 52.04% gc time)
13943237577224054960759//3099044504245996706400

julia> @time +(n...)
  0.000453 seconds (1.63 k allocations: 30.602 KiB)
13943237577224054960759//3099044504245996706400

But for * they are much more comparable

julia> @time *(N...)
  0.000833 seconds (1.82 k allocations: 124.891 KiB)
1//30414093201713378043612608166064768844377641568960512000000000000

julia> @time *(n...)
  0.000405 seconds (1.68 k allocations: 32.695 KiB)
1//30414093201713378043612608166064768844377641568960512000000000000

Note that for * the Rational{BitInt} and Quotient{Int} are very comparable, but once the + operation is introduced into the mix then Quotient{Int} is much slower than Rational{BitInt}.

Also note that the Quotient{Int} type can represent extended precision rationals without the usage of any BigInt in its representation (big only gets used for show methods).

From my understanding, now there are two problems:

  1. Using a “safe” version of number datatypes with all validation checks is an expensive operation which slows down the program. And one set of Julia users are ready to understand these intricacies and is ready to appropriately modify their variable types to get accurate answers
  2. Another set of Julia users don’t want to get into the intricacies of computing and just want to get their stuff done “accurately”; they may be ready to take a little speed penalty on the way.

I should say I’m a bit of both the groups. When I work on my side projects, I would be really happy to dive into the intricacies of high-performance code and is happy to learn a lot of new stuff on the way. But when I use Julia for my research purpose/simulations, I will mostly be dwelling on the research problem and may not be in a state to go through the intricacies; at this time, I’m more concerned about the result than how fast I got it. But these two are not isolated anyway; the experience from first usually helps me in the second to write both fast and accurate code. It is always an empowering experience, but at times with slight irritations.

Since Julia is advertised as a “high-performance language for scientific computing”, I believe Julia should take care of both the groups of people to grow. I see that the core Julia developers and Julia community are doing a great job in producing high-quality software and distributing it free of cost. I completely appreciate that. But I believe the current Julia community is most of the “early adopters” and to break into the bigger community, we need to be able to cater to the second group as well.

I understand that each decision regarding the core of Julia is taken after much discussion inside the community and it is perfectly right to keep the core focused on “high-performance”.

What I want to propose is a small trick outside of the core of Julia. Is it possible to define a function, which when called on Julia startup will provide a “safe” environment for scientists? This safe environment will automatically take care of safe types: ie a = 10 after this function call will make a a SafeInt or something similar which includes all the checks that is required for accurate answers?

Examples use case (without safe environment):

julia> 10^19
-8446744073709551616

With a safe environment (hypothetical)

julia > enable_safe_env()      # That hypothetical function which makes all data types Safe
julia > a = 10^19              # At this point, 'a' should automatically be of type 'SafeInt64', without the programmer bothering about it.
ERROR: OverflowError: 10^19
Stacktrace:
 [1] ^(::SafeInt64, ::SafeInt64) at /home/rdeits/.julia/packages/SaferIntegers/eMjmj/src/pow.jl:45
 [2] ^(::SafeInt64, ::Int64) at /home/rdeits/.julia/packages/SaferIntegers/eMjmj/src/pow.jl:71
 [3] macro expansion at ./none:0 [inlined]
 [4] literal_pow(::typeof(^), ::SafeInt64, ::Val{19}) at ./none:0
 [5] top-level scope at none:0

Now, in this case, all that is needed by the second group (who don’t want to get into the intricacies of computing) is just to make sure that there is this function call at the top of all their scripts and they can be sure that their experiments will not have any overflow/underflow problems.

Is it possible to create such a function, maybe as an external package, which can ensure a safe environment? My skills with Julia are just developing and is not enough to know where I should start for such a requirement. I would like to know whether community will be interested in such a solution and can we develop it together?

Regards,
v-i-s-h

PR: https://github.com/JuliaLang/julia/pull/31090

1 Like

I proposed this in a previous post in this topic, even with (hacky) sample code on how to do it. I don’t think it’s an outrageous idea, some other languages allow enabling/disabling overflow checking, for example C#. But I don’t know how feasible it is to do this cleanly in Julia.

Yes, I read that. It’s a good solution.
What I’m proposing is a much broader solution. But like you said

I’m hoping that maybe people with more experience than me can shed some light.

The easiest way would probably not be at startup, but as a separate sysimg / binary. Then you end up with three binaries (julia, julia-debug and julia-overflow).

However: Integers mod 1<<64 are an important object. Currently, all code in the ecosystem conflates whether it wants “integer ingeger”, which would become “safe int” in the modified sysimg, or whether it really wants modular integers. Turning all Int64 into non-wrapping analogues will break a lot of code that relies on the wrapping.

So, for an initial period, the new “julia-overflow” binary would be practically unusable. Until all of the ecosystem has beed updated.

But how do you update a package? We would need three types: Non-checking, checking, and maybe-checking integers. That is a huge amount of complexity for all users!

The cleanest way would be: Int and UInt are the “default”, Int64 is always modular (it even has the wrap-around in the name!), and safe_int{Int64} is always checked, and ModInt would be word-sized modular integer. But that goes completely against all current use of the types. And you would end up with an endless sea of bugs that uglify all code that has integer literals that want to be modular (“replace all t += 137” by t += ModInt(137)").

I don’t think that is feasible at all, in the sense that julia-overflow would never become usable and extract a huge tax on the default julia.

1 Like

I don’t want to use multiple versions of Julia, then what if a package relies on a specific version? Then I need to figure out which packages are compatible with what version of Julia (mess of Pkg isn’t it)?

Now, I’m not talking about version numbers, but things like julia, and juila-overflow etc

You can also make it a command-line flag; whether different binary or command-line flag does not really matter.

The reason for this is that quite a bit of code is generated in codegen.cpp, that would also need to respect the “check overflows” wish, just like --check-bounds={yes|no} is respected. Of course it might be possible to generate a single sysimg that permits runtime switching between checked and unchecked integer semantics, without incurring any runtime penalty in the unchecked case, and without blowing up the size of the sysimg. Count me doubtful.

However, I think that such a change is just as doomed as the (now deprecated) --math-mode=fast flag, because lots of code relies on wrapping. As an analogue, some code relies crucially on ieee semantics, using these juicy processor instructions in non-intuitive ways: Rounding as a feature, not a source of errors (this was the original reason for deprecating the flag).

The other way around is of course feasible: Disable all overflow checking. That would be as simple as having a piratical startup script that eats away all the overflow checks in your safe-int package and things like Base.Core.checked_trunc_uint.

I do not see why the proposal would imply such a mess.

  • All ints would remain Mod 2^64, just the wraparound would be an error for the checking ones.

Assume

  • Int64 are mod 2^64, never checking just like today
  • SafeInt64 always checking, like today
  • Only difference: Int will be checking or not depending on the global flag/julia version, that is Int
    will be an alias for Int64 or SafeInt64 depending. Also the constants will be Int by default.

The only consequence for packages is that Ints which rely on wrapping mod 2^64 will have
to be spelled explicitly as Int64. So a bunch of needed Int->Int64. Certainly no worse than the
transition 0.6->0.7 ?

The main issue is: what is the type of literals?

If literal 23 became SafeInt, then working with Int64 would become exactly as annoying as working with Int32. You can work very well and type-stable with SafeInt, because of the promotion rules: if pred a+= 1 end is type-stable for Int64, BigInt, Int128, SafeInt64. It is not type-stable for Int8, Int16, Int32. Hence, lots of code uses Int64 when Int32 would be perfectly fine. That would become much worse with SafeInt.

1 Like

I’m also not convinced it would have to be such a mess? If you start introducing new types, then sure that sounds complicated, but what if you have a “checked arithmetic” flag that only affects code in the user’s package/module (with the assumption that base code and other packages are already vetted, and you’re only interested in catching bugs in your own code). And that only checks certain operations, e.g. +, -, *, ^. So basically the only change is that the compiler emits overflow-checking code for those operations in user code. You’d also need a way to turn overflow checking off for certain parts of your code (like hash code calculation). If I’m not mistaken this is roughly how it works in C#, where you can enable checked arithmetic on a project-wide basis.

(Not that I’m really advocating for implementing this… Although I’ve coded quite a bit of C#, I never used the overflow checking feature, I am strongly opposed to turning it on by default, and I’m doubtful that OP would have actively enabled such a feature since he never expected integer overflow.)

I think it would only work if it was default, because as you have said Bennedich, if a user is aware of the possible issue, they will also be aware to just use eN rather than 10^N.

Since performance is key, I think it would just be better to make users more aware of the possible tripping hazard.

The hard part is putting the warning somewhere that it will be read. Someone who has programmed in other high level languages is not likely to read the Integer sections in the manual, they will be focused on the parts of Julia that are notably different: multiple dispatch, etc… those are the things that I think get priority from new users to Julia.

What is less of a hassle: experienced users turning off a flag, or new users turning on a flag?

For a moment I forgot who the OP even was…