Julia messes up integer exponents?

To tell someone that Julia is as easy as python, and then tell them sorry, but you also need to understand numerical precision (this isn’t really all that much to ask, but it feels like the statement “understand the tool” implies more than knowledge of a single aspect)… is a bit of a stretch. Julia is fairly easy, but this caught me offguard and I think it will get alot of people.

You can’t advertise in a way that is meant to appeal to naive scientists, and then scoff at them when they don’t understand things typical of a computer science degree. If you want to have the word “easy to use” you need to make it easy to use. And my hat is off to Julia it is very nice (I am pro Julia). I just think it could use some sort of flag on this aspect.

I didn’t mean to upset anyone with my ignorance, and I have learned alot.

What is a computers favorite beat? … an algo-rhythm

2 Likes

Summing a vector of 32 bit ints vs the same vector with SafeInt has a 430% slowdown.

Something like:

will not happen, but a PR that checks for overflow in integer exponentiation (with benchmarks) would be something useful to work on.

6 Likes

And this is a particularly dangerous operation to do without SafeInts

1 Like

So is plus between two floats. Arbitrary error possible.

Anyway, I’m not sure there is that much more to discuss? Julia has made the conscious choice to use native integer operations. For those who do number theory there are BigInts and for those that are worried about overflow there are SafeInts. Specific cases where overflow can be checked without significant performance cost is useful to work on.

4 Likes

Well, that is a different topic but actually BigInt in julia are too slow (about 250 times slower than Int).
Other languages have 50 times faster BigInts because their BigInts are what would be roughly a type
Union{Int,BigInt} in julia. Unfortunately such a type has not yet been implemented (and the implementation is currently beyond my skills). But SafeInts are very useful: if you use them you can use BigInts only if needed.

Also, it is not quite true that

Some operations on Rationals in Base are overflow checked, as well as a few others (and you propose exponentiation could be integer checked). Actually it would be good that there is the possibility to do such operations without any overflow checking to get the utmost speed. But that the default Int would check.

3 Likes

Re avogadro’s number: The right type post-redefinition is probably Int128?

julia> a=6.02214076e23
6.02214076e23
julia> round(Int128, a)
602214075999999987023872
julia> 602214076*Int128(10)^(23-8)
602214076000000000000000
julia> Float64(ans)==a
true

I have no clue why these guys decided to spec an integer that cannot be represented in Float64 without rounding. I mean, WTF?

4 Likes

Most Julia users are not computer scientists, and this is not a big issue lots of people are struggling with. I’m not a computer scientist, and I have no problem understanding this. Something being unexpected is not the same as being hard to understand.

I mostly find the “only a computer scientist”-card that sometimes gets pulled unconvincing. Understanding the type system or compiler passes or IR or whatever might be computer scientist stuff. Integer wrap-around, hardly. If you can understand the clock or a calendar, you can understand integer wrap-around.

6 Likes

@kristoffer.carlsson I see that one may run 7 times using system Ints what may run once using SaferIntegers. When its my $ at risk, … :cowboy_hat_face:


julia> int64vec = Int64.(rand(
    collect(-typemax(Int32):one(Int32):typemax(Int32)),
    1_000_000));

julia> safe64vec = SafeInt64.(int64vec);

julia> round( 
         (@belapsed sum($safe64vec)) / 
         (@belapsed sum($int64vec)), sigdigits=2)
3.7
julia> int32vec = Int32.(rand(Int16, 10000));
julia> sint32vec = SafeInt32.(int32vec);

julia> @btime sum($int32vec)
  774.327 ns (0 allocations: 0 bytes)
829512

julia> @btime sum($sint32vec)
  4.814 μs (0 allocations: 0 bytes)
829512

julia> round(4814/774,sigdigits=3)
6.22

That is good for you.

It is unfortunate that it turned into a scientist vs computer scientist thing. It was only meant to be a 6.023*10^23 vs 1.20687e18 thing. It has been answered. If you use 6.023*10^23 you will get the wrong results without knowing it. If you are lucky and catch the error despite no flags, you can dig around and find that you need to use either 6.023*10^23.0 or 6.023e23 or big() or SafeInt. You might even find the answer by google leading you to this thread, which is its reason for existing.

2 Likes

I apologize if I’m wrong, but I believe it was you who made it that way.

Does anyone do that? Large floats are always written 6.023e23.

3 Likes

This level of assumption is what bothered me. The code assumes this is the case. I think that is dangerous. Obviously I will not make that mistake now, but assuming this is true for all…

the fact that 6.023*10^23 returns 1.20687e18 is the sole reason for the post.

One way of making 6.023*10^23 behave as “expected” would be if the parser lowered 10^23 into something like @integer_power_str("10^23").

1 Like

This is an interesting suggestion. If I understand right the performance-critical Int things are things like indexing arrays, which would not involve ^. And I imagine it would be very rare to want wrap-around behaviour here. So adding a guard rail to catch this sounds possible.

Does anyone have a link handy to where this is summarised?

1 Like

Well, as I said, this is only about things being unexpected and unfamiliar. If you’re new to Julia, you also have to learn about the difference between 1 and 1.0. How is that any easier?

There is nothing boastful about me saying I’ve no problem understanding integer wraparound. Concepts like variables, functions, types, modules, etc. are far, far deeper and harder to grasp.

1 Like

Edited:

If I understand right this becomes Base.literal_pow(^, 10, Val(23)), so the power but not the base are known at compile time. I now think you were suggesting to capture both, so that the entire check could be done ahead of time?

But without touching the parser, the easy thing would be for this function to check for overflow at run-time. Perhaps this should not happen for ^2, but only for higher powers?

1 Like

The Avogadro number will disappear in just 3 months… from Wikipedia:

“Pending revisions in the base set of SI units necessitated redefinitions of the concepts of chemical quantity. The Avogadro number, and its definition, was deprecated in favor of the Avogadro constant and its definition. Based on measurements made through the middle of 2017 which calculated a value for the Avogadro constant of N_ \mathrm{A} = 6.022140758(62)\times10^{23} \ \mathrm{mol}^{−1}, the redefinition of SI units is planned to take effect on 20 May 2019. The value of the constant will be fixed to exactly 6.02214076\times10^{23} \ \mathrm{mol}^{−1}.”

This isn’t about the precision of a number… avogadro is just an example. It is about the fact that raising a number to a high integer will lead to a completely random number being assigned that value if you do it by using 10^N rather than eN where N is the integer. No flags are raised, and you conduct your calculation thinking you are using the constant you defined but actually the machine is using a random number because your integer exponent was too large. In the case of Avogadro’s number, my machine assigns it a number that is 5 orders of magnitude smaller. So all calculations using this number are wildly wrong. It happens for any power to a large integer constant though.

A minimal guard-rail might be: Source-code literal integer powers are already lowered differently. In the case that p^n, where both are source-literal integers, check for overflow during lowering/compilation (where we would like to compute this as a constant anyway) and then emit a lowering/compile/interpreter error, like

julia> ff()=10^23
julia> @code_native ff()
ERROR: Invalid literal "10^23" due to overflow. Write either `Int64(200376420520689664)` or `Int128(100000000000000000000000)`

Also, for some reason the compiler does not understand that bitstype-integer powers are pure, e.g. ff()=5^17.

Of course a different way of suppressing this guard-rail would be (10+0)^23 or 10^(23+0), which are not lowered as literals.

3 Likes

I’ve suggested in the past to have integer overflow checking in pow only, because it’s a fairly expensive operation (compared to add/multiply) and the most common reason somebody may encounter this.

15 Likes

Yes, if the parser captures both base and exponent, then the type can be different depending on the result, so 10^18 could be an Int64 and 10^23 could be an Int128, for example. This would not be type stable if done at run-time. (Edit: Maybe it would work at run-time, with some @pure magic and constant-folding?)

1 Like