What is the effect of `parse` on `Char' types?

parse(T::Type, str::String) converts a string str to a number of type T. However, parse(T::Type, c::Char) also returns numbers, but I’m not sure where they’re coming from and I can’t find a matching method signature in the documentation. For example, parse(Int, 'a') and parse(Int, 'A') both return 10, which isn’t the standard Unicode code point for either of those characters (it’s not hex notation either, because it goes past 'f'). What do these numbers mean?

Changing the default base modifies the behavior, and it seems to default to base 36 for the Char type, with 10 to 35 mapped from ‘A/a’ to ‘Z/z’
It also has a base 62 with ‘a’ as 36 and ‘z’ as 61

parse(Int, 'A', 36) == parse(Int, 'a', 36)
parse(Int, 'A', 36) != parse(Int, 'a', 37)
parse(Int, 'z', 37) #error
parse(Int, 'A', 10) #error

try

@edit parse(Int, 'a')
1 Like

This is a bit of a disaster and should really be fixed. Inconsistency of meaning between different bases is pretty bad. Issue filed: https://github.com/JuliaLang/julia/issues/26571.

I thought i’d try write a compatiable, but there’s a little hitch with using optional together with keyword args on v0.6.2, they kind of need to be one or the other, so this doesn’t replicate parse exactly. - keywords can still be optional though
It’s a bit slower too

function parse_C(::Type{T}, c::Char; base::Integer=36, legacy_fast::Bool=true, alphabet::Array{Char,1}=collect('0':'9'), alphanum::Array{T,1}=T.(0:9) ) where T<:Integer
  if(legacy_fast==true)
    a::Int = (base <= 36 ? 10 : 36)
    2 <= base <= 62 || throw(ArgumentError("invalid base: base must be 2 ≤ base ≤ 62, got $base"))
    d = '0' <= c <= '9' ? c-'0'    :
        'A' <= c <= 'Z' ? c-'A'+10 :
        'a' <= c <= 'z' ? c-'a'+a  : throw(ArgumentError("invalid digit: $(repr(c))"))
    d < base || throw(ArgumentError("invalid base $base digit $(repr(c))"))
    return convert(T, d)
  else
    xval::Int = indexin([c], alphabet)[1];
    xval > 0 || throw(ArgumentError("char $c not in alphabet $alphabet"))
    xval <= length(alphanum) || throw(ArgumentError("index $xval not in alphanum $alphanum"))
    return alphanum[xval];
  end;
end
@time parse(Int16, 'a') -> 4 alloc; 160 B
@time parse_C(Int16, 'a') -> 7 alloc; 432 B ##function definition overhead?
@time parse_C(Int16, '9', legacy_fast=false) -> 17 alloc; 1.3 KiB 
##with preallocated alphabet / alphanum
@time parse_C(Int16, '9', legacy_fast=true, alphabet=x, alphanum=z) -> 5 alloc; 288 B
@time parse_C(Int16, '9', legacy_fast=false, alphabet=x, alphanum=z) -> 14 alloc; 1.0 KiB

parse_C2() appears to be a bit faster than parse.() on a short array of char (disclaimer- using @time)
Further test: Its about equal at ~7500 chars, 1/3 as fast over ~7800000 chars, less allocs but more memory
780 chars is ~0.0003 sec and ~0.00008 sec

    xval = indexin(c, alphabet);
    in(xval, 0) && throw(ArgumentError("char $c not in alphabet $alphabet"))
    contains(>, xval, length(alphanum)) && throw(ArgumentError("index $xval not in alphanum $alphanum"))
    return alphanum[xval];