What is the effect of `parse` on `Char' types?

tparker · March 22, 2018, 1:21am

parse(T::Type, str::String) converts a string str to a number of type T. However, parse(T::Type, c::Char) also returns numbers, but I’m not sure where they’re coming from and I can’t find a matching method signature in the documentation. For example, parse(Int, 'a') and parse(Int, 'A') both return 10, which isn’t the standard Unicode code point for either of those characters (it’s not hex notation either, because it goes past 'f'). What do these numbers mean?

y4lu · March 22, 2018, 2:01am

Changing the default base modifies the behavior, and it seems to default to base 36 for the Char type, with 10 to 35 mapped from ‘A/a’ to ‘Z/z’
It also has a base 62 with ‘a’ as 36 and ‘z’ as 61

parse(Int, 'A', 36) == parse(Int, 'a', 36)
parse(Int, 'A', 36) != parse(Int, 'a', 37)
parse(Int, 'z', 37) #error
parse(Int, 'A', 10) #error

Tamas_Papp · March 22, 2018, 6:31am

try

@edit parse(Int, 'a')

StefanKarpinski · March 22, 2018, 1:34pm

This is a bit of a disaster and should really be fixed. Inconsistency of meaning between different bases is pretty bad. Issue filed: https://github.com/JuliaLang/julia/issues/26571.

y4lu · March 27, 2018, 5:34am

I thought i’d try write a compatiable, but there’s a little hitch with using optional together with keyword args on v0.6.2, they kind of need to be one or the other, so this doesn’t replicate parse exactly. - keywords can still be optional though
It’s a bit slower too

function parse_C(::Type{T}, c::Char; base::Integer=36, legacy_fast::Bool=true, alphabet::Array{Char,1}=collect('0':'9'), alphanum::Array{T,1}=T.(0:9) ) where T<:Integer
  if(legacy_fast==true)
    a::Int = (base <= 36 ? 10 : 36)
    2 <= base <= 62 || throw(ArgumentError("invalid base: base must be 2 â‰¤ base â‰¤ 62, got $base"))
    d = '0' <= c <= '9' ? c-'0'    :
        'A' <= c <= 'Z' ? c-'A'+10 :
        'a' <= c <= 'z' ? c-'a'+a  : throw(ArgumentError("invalid digit: $(repr(c))"))
    d < base || throw(ArgumentError("invalid base $base digit $(repr(c))"))
    return convert(T, d)
  else
    xval::Int = indexin([c], alphabet)[1];
    xval > 0 || throw(ArgumentError("char $c not in alphabet $alphabet"))
    xval <= length(alphanum) || throw(ArgumentError("index $xval not in alphanum $alphanum"))
    return alphanum[xval];
  end;
end

@time parse(Int16, 'a') -> 4 alloc; 160 B
@time parse_C(Int16, 'a') -> 7 alloc; 432 B ##function definition overhead?
@time parse_C(Int16, '9', legacy_fast=false) -> 17 alloc; 1.3 KiB 
##with preallocated alphabet / alphanum
@time parse_C(Int16, '9', legacy_fast=true, alphabet=x, alphanum=z) -> 5 alloc; 288 B
@time parse_C(Int16, '9', legacy_fast=false, alphabet=x, alphanum=z) -> 14 alloc; 1.0 KiB

parse_C2() appears to be a bit faster than parse.() on a short array of char (disclaimer- using @time)
Further test: Its about equal at ~7500 chars, 1/3 as fast over ~7800000 chars, less allocs but more memory
780 chars is ~0.0003 sec and ~0.00008 sec

    xval = indexin(c, alphabet);
    in(xval, 0) && throw(ArgumentError("char $c not in alphabet $alphabet"))
    contains(>, xval, length(alphanum)) && throw(ArgumentError("index $xval not in alphanum $alphanum"))
    return alphanum[xval];

Topic		Replies	Views
Why parse(UInt8 and parse(Int8 get diffrent type? General Usage	6	477	January 30, 2020
Converting strings of numbers to numbers? New to Julia	22	35152	May 29, 2018
Parsing Strings as Strings New to Julia strings , parsing	7	1594	February 18, 2021
Parse Float32 string General Usage	9	455	September 8, 2023
How to extend `Base.parse` for my types? General Usage development , design	0	950	November 11, 2019

What is the effect of `parse` on `Char' types?

Related topics