Bug? parse(Integer, ...) calls typemax

Is this a bug

parse(Integer, "1234567890987654321234567890")
ERROR: MethodError: no method matching typemax(::Type{Integer})
Closest candidates are:
  typemax(!Matched::Union{Dates.DateTime, Type{Dates.DateTime}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Dates\src\types.jl:426
  typemax(!Matched::Union{Dates.Date, Type{Dates.Date}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Dates\src\types.jl:428
  typemax(!Matched::Union{Dates.Time, Type{Dates.Time}}) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Dates\src\types.jl:430
  ...
Stacktrace:
 [1] tryparse_internal(#unused#::Type{Integer}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
   @ Base .\parse.jl:128
 [2] parse(::Type{Integer}, s::String; base::Nothing)
   @ Base .\parse.jl:241
 [3] parse(::Type{Integer}, s::String)
   @ Base .\parse.jl:241
 [4] top-level scope
   @ none:1
julia --version
julia version 1.6.0

Did you mean parse(Int,...)?
(which will overflow, btw)

Oops. Integer is an abstract type. I still think I should be able to parse to Integer and get the subtype that the value fits in.

This works

parse(BigInt, "1234567890987654321234567890")
1234567890987654321234567890

but this

typeof(parse(BigInt, "4"))
BigInt

seems wasteful.

Integer can have infinitely many sub-types, what should Julia do? dynamically call subtypes and check each typemax and sort them and find the smallest fitting one? Seems wasteful.

2 Likes

Yup. Good point.

And if that is what you want, you can of course do

T = Int8

while true
    try
        x = parse(T, s)
        break
    catch e
        e isa UnrepresentableError || rethrow(e) #whatever the error type is called, need to look this up
        T = widen(T)
    end
end
2 Likes

A workaround to fix this is to write

parse(Int, "4")

instead of specifically asking for a BigInt.

1 Like

Int isn’t less specific than BigInt (on a given system). I think OP’s point is they want the smallest Integer type which can represent the number in a string.

1 Like

Read it again:

He’s suggesting that this should not return a BigInt.

I read that to mean that the more inclusive method parse(BigInt, stringvariable) is wasteful for small values of stringvariable.

1 Like

Maybe, but that depends on the use case, one can easily imagine cases where combinations of small integers overflow.

Anyway, I read it as a request that the compiler should override parse(BigInt, "4") to return a smaller type. Can you clarify, @Mark_Nahabedian?

At my computer now, so I could test it out:

function minimalparse(s)
    T = Int8
    while true
        try
            return parse(T, s)
        catch e
            e isa OverflowError || rethrow(e)
            T = widen(T)
        end
    end
end
julia> typeof(minimalparse("123"))
Int8

julia> typeof(minimalparse("1234"))
Int16

julia> typeof(minimalparse("123456"))
Int32

julia> typeof(minimalparse("123456789012"))
Int64

julia> typeof(minimalparse("123456789012345678901234"))
Int128

julia> typeof(minimalparse("12345678901234567890123456789012345678901234567890"))
BigInt

Obviously this is not type stable. It may very well be more performant to use a “too large” type for many applications.

I’m sorry. From the discussion, it’s clear I wasn’t really thinking.

A CommonLisp implementation would use FIXNUM until it overflowed into BIGNUM. The built-in types in CommonLisp aren’t extensible though.

Given that Integer us subtypable, the Julia implementation can’t guess which subtypes are most appropriate for a given application. I suppose if a developer really didn’t know the range of integers they were dealing with, and wanted a minimal size integer, they could implement their own function to do that, or specialize parse.

One possible behavior would be to start with Int until the number being parsed overflows and then resort to BigInt. That would be analogous to the CommonLisp implementation, would represent all integers, and those that fit would be represented in a manner mist suited to the architecture. This is not consistent with Julia’s wrap on overflow behavior though. It’s been so long since I’ve dealt with anything near the machine level that I have no clue whether machines still allow trap on arithmetic overflow or whether any IS or language runtime exposes that to the programmer.

I suppose at the hardware level the “extra” most negative twos compliment integer could be treated as an overflow indicator analogous to the various floating point NaNs. I doubt thus could be gone without adding gate delay to what is probably the most fundamental operation in computation.

Can you tell I’m having trouble falling asleep?

| DNF
August 29 |

  • | - |

Maybe, but that depends on the use case, one can easily imagine cases where combinations of small integers overflow.

Anyway, I read it as a request that the compiler should override parse(BigInt, "4") to return a smaller type. Can you clarify, @Mark_Nahabedian?

DNF: I don’t think I have a clear, coherent request.

Gustaphe:. Pretty neat.

I don’t understand how the existence of Overflow error and it’s behavior in gystaphe’s example is consistent with what I read in the Julia numbers doc about wrap-around. See Integers and Floating-Point Numbers · The Julia Language under “Overflow Behavior”. Can someone clarify this inconsistency?

I think that specifically applies to arithmetic. When doing a + b, it doesn’t check for overflow. parse(T, s) on the other hand, is assumed to be a rare and expensive enough operation, the overflow check is not a big overhead. Another way to view it is, there is a pretty clear and consistent way to interpret addition of two large numbers, but there is no obvious answer to what julia should do with parse(Int8, "1234"). It can only do normal overflow if it parses the entire number into a larger type (or somehow divides it into some arithmetic operations). An addition on the other hand doesn’t need to know that it’s overflowing, which is why the choice was made to not error.

On x86, many instructions will set the CF (the carry flag) to indicate this. inc (increment) and dec (decrement) will not, for example, but add/sub will.

1 Like