Style question: defining methods for `tryparse`, `parse`


#1

I am parsing some datasets and it would feel quite natural to define things like

immutable DateFormat
    format::String
end

tryparse(DateFormat("yyyymmdd"), "19800101") # => Nullable{Date}(1980,1,1)

This would allow specifying parsers in a richer language than the type system. I am wondering if this is good style. Eg is there an invariant that tryparse(T,string) should return Nullable{T}?


#2

In general, I think you should either return a Nullable, or overload parse instead.

But don’t the existing Date parsing methods listed in the docs work for you?


#3

Existing methods work fine, in fact, I am building on them. But I find it useful to encapsulate parsing information into an object. Eg “this is a date, parse it like this”, or “parse this field with the following strings treated as missing data”. The type system is not rich enough to describe this.

(EDIT) To make things concrete, this is an example I am using for a dataset at the moment:

immutable CustomDate end

macro nullable_catch(expr, catch_errors=:ArgumentError)
    @assert expr.head ≡ :(::) "Use this macro as @nullable_catch value::type."
    (expr_value, expr_type) = map(esc, expr.args)
    quote
        try
            Nullable{$expr_type}($expr_value)
        catch e
            if isa(e, $(esc(catch_errors)))
                Nullable{$expr_type}()
            else
                rethrow()
            end
        end
    end
end

function tryparse(::Type{CustomDate}, string)
    if !isascii(string) || (string == "00000000")
        Nullable{Date}()
    elseif endswith(string, "00")  # day is corrected to 1
        @nullable_catch Date(string[1:end-2], "yyyymm")::Date
    else
        @nullable_catch Date(string, "yyyymmdd")::Date
    end
end

which I call like this:

tryparse(CustomDate, "19800101")
tryparse(CustomDate, "19800100")    # bogus dates in my data
tryparse(CustomDate, "nonsensical") # Nullable{Date}()

The design/style questions are:

  1. whether I should define methods for tryparse anyway, or make a new function,
  2. should I need a function that returns the parsed type for parsing specifications which are not Julia types, eg
parsedtype(::Type{CustomDate}) = Date

with a fallback

parsedtype{T}(::Type{T}) = T

for other types. But maybe that is too general, and I should only define for types which tryparse accepts? Interfaces are hard :slight_smile:


#4

OK. I think it’s fine to add methods to parse or tryparse as long as they only accept your CustomDate type so that they don’t interfere with Base types.

As regards question 2, I guess it depends on how general your needs are. But if you really want a clean API, I would say the @nullable_catch trick is backwards: instead of catching an exception and returning null, it would be better to parse dates using trycatch tryparse (which returns a Nullable) – which still allows throwing an exception if it returns null when you need that behavior. Performance would be much better. Of course, that cannot work without changes in Julia Base: we would need parse and tryparse for dates. See this pull request.


#5

Thanks, this is good advice. Where is trycatch? I could not find it.


#6

In parse.jl, mostly. Let’s test the code highlighting feature of Discourse:

julia> methods(tryparse)
# 9 methods for generic function "tryparse":
tryparse{T<:Integer}(::Type{T}, s::AbstractString, base::Integer) in Base at parse.jl:142
tryparse{T<:Integer}(::Type{T}, s::AbstractString) in Base at parse.jl:144
tryparse(::Type{Float64}, s::String) in Base at parse.jl:153
tryparse(::Type{Float64}, s::SubString{String}) in Base at parse.jl:154
tryparse(::Type{Float32}, s::String) in Base at parse.jl:156
tryparse(::Type{Float32}, s::SubString{String}) in Base at parse.jl:157
tryparse{T<:Union{Float32,Float64}}(::Type{T}, s::AbstractString) in Base at parse.jl:159
tryparse(::Type{BigFloat}, s::AbstractString) in Base.MPFR at mpfr.jl:113
tryparse(::Type{BigFloat}, s::AbstractString, base::Int64) in Base.MPFR at mpfr.jl:113


#7

The line immutable CustomDate end seems like a good case for using instead a Value type. I would prefer the latter style so as to explicitly state the role of the type. As long as you keep an eye on performance, you can come with some interesting code, for example: tryparse(Val{:custom}, Date, "19800101").


#8

I am afraid I don’t understand. You mentioned trycatch, which I only found in the Scheme code and another mention in an issue (for something different), not tryparse (which I am already using).


#9

Excellent suggestion! Probably I will do it this way. But perhaps the parameter that decides the format should come last, like base for parse(Int, ...).


#10

Sorry, of course I meant tryparse. What I meant is that you shouldn’t have to call the Date constructor, which throws, but have a tryparse method available to replace it instead. This is what the linked PR does.


#11

Not specific to tryparse, but I’ve used string macros for this before (e.g Glob fnmatch, Regex constructor, and XML xpath parsing). This also gives the benefit of parse-time validation & return type calculation.

To give a more concrete example:

immutable DateReader{RType}
    parser_syntax_tree # some decomposed representation
end
macro dateformat_str(str)
    return DateReader{rettype(str)}( g(str) )
end
tryparse(dateformat"yyyymmdd", "19800101")::Nullable{RType}