I am parsing some datasets and it would feel quite natural to define things like
immutable DateFormat
format::String
end
tryparse(DateFormat("yyyymmdd"), "19800101") # => Nullable{Date}(1980,1,1)
This would allow specifying parsers in a richer language than the type system. I am wondering if this is good style. Eg is there an invariant that tryparse(T,string) should return Nullable{T}?
Existing methods work fine, in fact, I am building on them. But I find it useful to encapsulate parsing information into an object. Eg “this is a date, parse it like this”, or “parse this field with the following strings treated as missing data”. The type system is not rich enough to describe this.
(EDIT) To make things concrete, this is an example I am using for a dataset at the moment:
immutable CustomDate end
macro nullable_catch(expr, catch_errors=:ArgumentError)
@assert expr.head ≡ :(::) "Use this macro as @nullable_catch value::type."
(expr_value, expr_type) = map(esc, expr.args)
quote
try
Nullable{$expr_type}($expr_value)
catch e
if isa(e, $(esc(catch_errors)))
Nullable{$expr_type}()
else
rethrow()
end
end
end
end
function tryparse(::Type{CustomDate}, string)
if !isascii(string) || (string == "00000000")
Nullable{Date}()
elseif endswith(string, "00") # day is corrected to 1
@nullable_catch Date(string[1:end-2], "yyyymm")::Date
else
@nullable_catch Date(string, "yyyymmdd")::Date
end
end
which I call like this:
tryparse(CustomDate, "19800101")
tryparse(CustomDate, "19800100") # bogus dates in my data
tryparse(CustomDate, "nonsensical") # Nullable{Date}()
The design/style questions are:
whether I should define methods for tryparse anyway, or make a new function,
should I need a function that returns the parsed type for parsing specifications which are not Julia types, eg
parsedtype(::Type{CustomDate}) = Date
with a fallback
parsedtype{T}(::Type{T}) = T
for other types. But maybe that is too general, and I should only define for types which tryparse accepts? Interfaces are hard
OK. I think it’s fine to add methods to parse or tryparse as long as they only accept your CustomDate type so that they don’t interfere with Base types.
As regards question 2, I guess it depends on how general your needs are. But if you really want a clean API, I would say the @nullable_catch trick is backwards: instead of catching an exception and returning null, it would be better to parse dates using trycatchtryparse (which returns a Nullable) – which still allows throwing an exception if it returns null when you need that behavior. Performance would be much better. Of course, that cannot work without changes in Julia Base: we would need parse and tryparse for dates. See this pull request.
In parse.jl, mostly. Let’s test the code highlighting feature of Discourse:
julia> methods(tryparse)
# 9 methods for generic function "tryparse":
tryparse{T<:Integer}(::Type{T}, s::AbstractString, base::Integer) in Base at parse.jl:142
tryparse{T<:Integer}(::Type{T}, s::AbstractString) in Base at parse.jl:144
tryparse(::Type{Float64}, s::String) in Base at parse.jl:153
tryparse(::Type{Float64}, s::SubString{String}) in Base at parse.jl:154
tryparse(::Type{Float32}, s::String) in Base at parse.jl:156
tryparse(::Type{Float32}, s::SubString{String}) in Base at parse.jl:157
tryparse{T<:Union{Float32,Float64}}(::Type{T}, s::AbstractString) in Base at parse.jl:159
tryparse(::Type{BigFloat}, s::AbstractString) in Base.MPFR at mpfr.jl:113
tryparse(::Type{BigFloat}, s::AbstractString, base::Int64) in Base.MPFR at mpfr.jl:113
The line immutable CustomDate end seems like a good case for using instead a Value type. I would prefer the latter style so as to explicitly state the role of the type. As long as you keep an eye on performance, you can come with some interesting code, for example: tryparse(Val{:custom}, Date, "19800101").
I am afraid I don’t understand. You mentioned trycatch, which I only found in the Scheme code and another mention in an issue (for something different), not tryparse (which I am already using).
Excellent suggestion! Probably I will do it this way. But perhaps the parameter that decides the format should come last, like base for parse(Int, ...).
Sorry, of course I meant tryparse. What I meant is that you shouldn’t have to call the Date constructor, which throws, but have a tryparse method available to replace it instead. This is what the linked PR does.
Not specific to tryparse, but I’ve used string macros for this before (e.g Glob fnmatch, Regex constructor, and XML xpath parsing). This also gives the benefit of parse-time validation & return type calculation.
To give a more concrete example:
immutable DateReader{RType}
parser_syntax_tree # some decomposed representation
end
macro dateformat_str(str)
return DateReader{rettype(str)}( g(str) )
end
tryparse(dateformat"yyyymmdd", "19800101")::Nullable{RType}