Tonight I hated julia, REPL and overflows

Hello,

I am a newcommer in Julia. I have read almost entire never ending boring manual, read a couple books, watched some videos, and practicing probability theory with julia with much enthusiasm. I spend great of time to understand the type system and parametric types. But the more I get into the language, the more I get annoyed. I give these issues to my lack of knowledge and try not to give up.

Here is major one:

I was trying to do the calculation of famous birthday problem with julia.

I am supposed to calculate this formula, (my interpretation of solution to julia)

Probability of n people having the same birthday in a room: (It does not matter if the solution is correct, the point is the calculation) It is basically 1 - (Permutation(365 by n)/365^n)

p(n) = 1 - prod(collect(365:-1:(365-n+1))) / 365^n

This overflows, I receive no warning etc. Since the probability is bigger than 1, after checking the compound statement part by part and convincing myself with correct syntax, I realized that.

Then I tried do the math with BigInt at RELP. Julia says no, I can not create typed variable at global scobe in RELP, (why? I am sure there is very detailed reasoning but, hey this language supposed to replace other math environment, and help people interactive analysis right?)

I bumped into chicken and egg problem. The equation overflows, but I can not create typed (BigInt) variables to handle the issue? Now I had to go back and write a script with BigInt etc. instead of doing in one line on the console.

After strugling with no method errors for BigInt, I could finally calculated the result.

I am novice in Julia but I am quite sure there can be something done.

If variable can overflow, then provide some sort of typing mechanism for RELP at global scope, give exception or some sort of control mechanism for that and let the programmer know.

If checking overflow is too costly in performance, then let the programmers decide what to do, or auto promoto to BigInt. Many languages automatically do that.

Make overflow check a configuration parameter or something. Overflow problem can be very difficult to track in bigger and deep calculations.

I believe such a promising new programming language, claiming the best of everthing, ease of use, speed etc. at 2020 should handle this.

In my part, I have seen not much ease of use so far other than zillions of cryptic ugly looking method errors and types issues with does not help much contrary to my expectation for identifying errors in so called typed language. (I am really fed up trying to decipher cryptic constructors by looking damm source codes of modules)

I have watched a video on youtube stating on Julia con stating that julia is the has the most type sharing language. I find it hard to believe. Other than native types and some well knows ones, DataFrame, I see zillions of different things. I am always forced to figure out some weird constructors.

This are my too cents for anyone interested.

Sincerely.

3 Likes

What you are talking about has nothing to do with what you need. I’m not saying it’s completely trivial to discover what you need but you should realize you might be doing things wrong rather than straight claiming that what you need is impossible. Basically none of the issues you claimed follows this sentence makes much sense because of it.

There’s no typed variables in global scope but that’s just a performance issue. You want to create a BigInt in global scope so you should just do that instead and. It has nothing to with variable types. Searching BigInt in the doc gives planty of relavant results

Also, I think you just want to use floating points.


And you give no detail of any other issues you mentioned so I’ll just treat those as empty claims.

9 Likes

Instead of BigInt, you could use floats (note the 365. and 365.0) (and you don’t need collect()):

julia> p(n) = 1 - prod(365.:-1:(365-n+1)) / 365.0^n
p (generic function with 1 method)

julia> p(25)
0.5686997039694639

Note that this behavior is very common in “other math environments”; for example Matlab also overflows with no warning:

>> n = int32(100)

n =

  int32

   100

>> 1 - prod(365:-1:(365-n+1)) / 365^n

ans =

  int32

   -2147483646

The best approach is to be cautious when dealing with potentially large integers, or include your own checks in your code.

6 Likes

Hi, welcome!
To find why you got no warning about overflow, google: julia overflow integer. Eg, here is a good thread A plea for int overflow checking as the default

I think the following does what you want

julia> p(n) = 1 - prod(collect(365:-1:(365-n+1))) / 365^n;

julia> p(big(23))
0.50729723432398540722...

This sends an argument of type BigInt. Because of promotion rules, all of the relevant calculations are done with BigInts.

14 Likes

The information you provided aside, I find the tone in your post to be quite condescending. A reference to a “never ending boring manual” is unnecessary. A lot of people on here reply in between or after their regular day jobs — this is supposed to a nice, fun place to chat about all things Julia, resolve issues, help newcomers with code and be in the know about what’s next. Please try to keep this in mind when you post and let’s try to keep discourse as a positive place and not as a forum to vent.

30 Likes

I have never seen an error message like that - all variables we create have concrete types. (You cannot redefine composite types without a REPL restart in the global scope, that’s true.) In most cases, you can call type names as constructors, in this case eg. BigInt(5). As jlapeyre mentioned, big(5) works here, too.

1 Like

I think that using non-overflow checked integers as default in Julia is a very reasonable choice. For the vast majority of cases, there is no risk of overflow and the performance impact of overflow checking would be large (it probably prevents using SIMD).
If there is a risk of overflow, there are several options:

  1. BigInt
  2. Int128
  3. https://github.com/JeffreySarnoff/SaferIntegers.jl

See here for an explanation of the Julia devs:
Frequently Asked Questions · The Julia Language

Edit: missing word added

4 Likes

You are misunderstanding this piece of information. Look here:

julia> N = big(365)
365

julia> typeof(N)
BigInt

The global scope typing issue doesn’t have anything to do with being able to create variables with values of a certain type (I mean, all values have types in Julia.) The global scope issue is that the compiler cannot be certain of the type of a global variable (unless it is marked with const), and therefore may not produce optimal code. Values still have types.

Anyway, you don’t need to worry about global scope, just use the BigInt inside your function, like this, directly inside the REPL:

julia> p(n) = 1 - prod(big(365):-1:(big(365)-n+1)) / big(365)^n

julia> p(250)
0.9999999999999999999999999999999999999999999999999977070781298158485920876595337

And don’t use collect.

Those languages are slow, as far as I know.

6 Likes

BTW, did you consider

p(n) = 1 - prod(((365-n+1):365)./365)

?

7 Likes

Also relatedly - if you use integers not floats then probably you want an exact result and Rationals do overflow checking, e.g.:

julia> p(n) = 1 - prod(((365-n+1):365) .// 365)
p (generic function with 1 method)

julia> p(250)
ERROR: OverflowError: 172615601860890625 * 365 overflowed for type Int64

julia> p(big(250))
60783318200087226682937766049236689927740288880324835641180939615840853077305302521582319761155880382046903410599287510714689056871860382372225871467823802914855209569454994856436325374703989916595040572788734599190036892154393098851062184082513098323681767075863597647294833943993555894766923567584718008946122619845493345870917482420692554481460496906235349146103378704888738550433322306587655456114611855484502667198832077708058006305795667519839602235607937824602274322926446252809334379850668885439289797448589884630194391593593876505256143728639011534958382838870632072538540936993153//60783318200087226682937766049236689927740288880324975012580582958212446162716161836332473493159082365590413362393641358116382303381317690170637169128428413619221450730936945406541254725982582727849529518046291781121515427755871228399939736312309859624863560499391580308070781606949306094035633247290983152192747328483925898910874964142340073691863149637882661801964467606121912815254089089195376769760963984310009200305106751293061539714601508717912733593990136950258598307824874931637016947890842883929833404644035966434246851299469922100352851077921523259561809027218259871006011962890625

julia> float(p(big(250)))
0.9999999999999999999999999999999999999999999999999977070781298158485920876595337
7 Likes

I think that this is worth emphasizing, except that the FAQ entry about overflow is already very, very long so I would hesitate to add to it.

Many people new to programming (not specifically Julia) think of Int as the integers \mathbb{Z}. This misleading: even seemingly simple operations like 21! will overflow, as

julia> factorial(big(21)) > typemax(Int)
true

Instead, it is best to think of Int as a type for counting and indexing objects or operations where one would run out of memory or time well before reaching typemax(Int64). For example,

  1. indexing bytes with Int64 one can cover the exabyte range, well beyond the practical limits of computers today,

  2. a loop body taking a nanosecond can be iterated with an Int64 for about 300 years.

Int is safe where these kind of constraints will be binding earlier than typemax(Int) (all of the above applies to 32-bit machines with relevant changes).

In all other cases, one should use something else. That said, the use case of Int is so common that it is a basic building block of almost all practical languages, and is a reasonable default.

10 Likes

Slightly off topic, but related in terms of thinking.

Last week I talked with my son (who is in high school) about comparing numbers. And he told me that he was taught that floats should not be used in comparisons so he always designs his code to avoid doing it.

What was striking is that he was taught “do not do it” rather than “if you do it remember it is unsafe sometimes”. We have a similar situation here I think - first one has to understand that something should not be done (i.e. doing arithmetic on Int when they are large, especially on 32-bit system) and then learn that sometimes it is safe to do.

Now I asked him about integer overflow - and he told me that this is the second basic principle (apart from comparing floats) that he was taught at the very beginning - integers might overflow and he should always make sure he does not hit this problem.

PS. I do combinatorial graph theory a lot in Julia and in what I do I routinely would overflow Int64. Still I find Julia a much better language for the job than either Python or C.

7 Likes

It could be we misunderstand this post, perhaps @benibilme just needed to vent a second.

Hey @benibilme
asking for help works better than complaining about the language in general terms. But I see that you were frustrated which is understandable. From my experience most users of Julia get a good mental model of how most (not all) things work quite fast, so don’t give up too early. And maybe trust us a bit in that things in Julia fit quite well together how they are.

18 Likes

You will never have a language that is reasonably fast and defaults to arbitrary precision arithmetic. That isn’t just an unsolved problem, it’s fundamentally unsolvable for clear reasons. Basically, the issue boils down to the fact that if you do arithmetic with something that is arbitrary precision, you have to have a variable bit size, since otherwise someone could always give you a bigger number! But if you have a variable bit size, there is no longer a clear answer to “how big is an array of 10 integers?”. That answer is no longer type dependent but value dependent: it depends on which integers! So it’s now impossible for a compiler to create such an array in a dense style, or know the register for the 5th element.

To counteract this, you can make all of your numbers not be bits but objects, moving them from the stack to the heap. This means that they are no longer free to make and will require GC control, and there’s like a 100x performance penalty at this size.

Basically, most things would clearly be unusable if that was the true answer, so your CPU hardcoded 64 bit integer operations to be fast. This means all reasonably fast languages (C, Fortran, Rust, Julia, and beyond) will use 64 bit integers in nicely packed CPU optimized ways. And 64 bits is large enough that looping to the end of them is nearly impossible, so the only reasonable way to overflow them is with combinatorics problems (which is why hardware manufacturers settled on this value).

But indeed, since the size of combinatoric numbers grow exponentially, it’ll be hard to ever satisfy the combinatorics community. Would you want every calculation on your computer to take 5x time and 5x memory and be at a higher bitsize, just so the birthday problem works by default? Probably not. So no, this problem is not solved in full in 2020 because it is actually ridiculously (exponentially) difficult.

The solution here is to use the standard fast arithmetic but make it very easy to opt into slower variable-sized arithmetic. Other answers demonstrated how Julia’s type specialization makes a generically duck typed algorithm automatically recompile for BigInt inputs (and Rational!), so if your problem needs it, just give the input type it needs. Or use SafeIntegers which will give an overflow warning (for a cost of course)

15 Likes

There is an intermediate approach, used eg in Common Lisp: reserve a few (eg 2) bits for a “type tag”, use the rest as either representing an immediate value (eg a 62-bit integer) or a pointer to the heap (eg addressing memory in 4-byte chunks). This is of course still orders of magnitude slower than native Int64, but can break out to bignums on demand, avoiding them entirely for smaller integers.

Obviously this breaks zero-cost wrapper types that are common in Julia, so it is only of historical interest. This is indeed a problem that has no ideal solution, so various languages pick different points on the performance/convenience/safety frontier.

5 Likes

It would be nice to have hardware support for this.

2 Likes

We had with Sparc.

7 Likes

Oh cool, never knew, but I found it mentioned:

Tagged add and subtract instructions perform adds and subtracts on values checking that the bottom two bits of both operands are 0 and reporting overflow if they are not. This can be useful in the implementation of the run time for ML, Lisp, and similar languages that might use a tagged integer format.

So maybe we just recommend people use SPARC chips? :troll:

7 Likes

Afaik, OpenSparc exists in the CS research world, as well as tagged ISA for RISCV variants (~3 years ago). The ideas about tagged never went away, but back then on classic sparc 1990…2005 the use cases were sparse …

1 Like

The plan is:

  1. Julia takes over all high-performance computing,

  2. CPU manufacturers optimize their products for Julia.

:wink:

27 Likes