Suppose I have an output from a software. It is a pure text without any specific syntax. I need to write regular expressions to capture the information I want. For example, if I want to capture the following pattern:
XXXX
X 0.0 0.0 0.0
Y 0.0 0.0 0.0
Z 0.0 0.0 0.0
I write the following type
struct XXXX{A<:AbstractMatrix}
data::A
end
Should I extend Base.parse
as
function Base.parse(::Type{T}, str::AbstractString) where {S,T<:XXXX{S}}
# find the pattern using regular expressions
# parse the data
end
So here comes my 1st question, since XXXX
has a type parameter A
, which is also a container type (AbstractMatrix
), should I respect the users’ choice of S
? That is if they do
parse(XXXX, str) # Return `XXXX{Matrix}` of whatever element type (depend on the content of the `str`)
parse(XXXX{SMatrix}, str) # Return `XXXX{SMatrix}` of whatever element type (depend on the content of the `str`)
parse(XXXX{SMatrix{Float64}}, str) # Return `XXXX{SMatrix{Float64}}` even if the elements in `str` are all integers
then I should return what I said in the comments?
As I observed in Base.parse
of Complex
, they almost do what I said:
julia> parse(Complex, "1 + 1im") |> typeof
ERROR: MethodError: no method matching tryparse_internal(::Type{Complex}, ::String, ::Int64, ::Int64, ::Bool)
Closest candidates are:
tryparse_internal(::Type{Bool}, ::Union{SubString{String}, String}, ::Int64, ::Int64, ::Integer, ::Bool) at parse.jl:178
tryparse_internal(::Type{BigInt}, ::AbstractString, ::Int64, ::Int64, ::Integer, ::Bool) at gmp.jl:261
tryparse_internal(::Type{T<:Integer}, ::AbstractString, ::Int64, ::Int64, ::Bool) where T<:Integer at parse.jl:375
...
Stacktrace:
[1] parse(::Type{Complex}, ::String) at ./parse.jl:380
[2] top-level scope at REPL[4]:100
julia> parse(Complex{Float64}, "1 + 1im") |> typeof
Complex{Float64}
julia> parse(Complex{Int32}, "1 + 1im") |> typeof
Complex{Int32}
My 2nd question is, what if there are several matched patterns in that file? I tried some methods of Base.parse
, none of them seem to accept multiple patterns in the str
:
julia> parse(Complex{Int32}, "1 + 1im 2+2im")
ERROR: ArgumentError: invalid base 10 digit 'i' in " 1im 2+2"
Stacktrace:
[1] tryparse_internal(::Type{Int32}, ::String, ::Int64, ::Int64, ::Int64, ::Bool) at ./parse.jl:132
[2] tryparse_internal at ./parse.jl:375 [inlined]
[3] tryparse_internal(::Type{Complex{Int32}}, ::String, ::Int64, ::Int64, ::Bool) at ./parse.jl:345
[4] parse(::Type{Complex{Int32}}, ::String) at ./parse.jl:380
[5] top-level scope at REPL[6]:1
So I guess I can only parse
one XXXX
at a time? How should I parse all of them in a file? Should I let users do
parse.(XXXX, collect(eachmatch(REGEX_OF_XXXX, str)[1]))
or write a function
parse_xxxx(str) = parse.(XXXX, collect(eachmatch(REGEX_OF_XXXX, str)[1]))
Should I call it parse_xxxx
or read_xxxx
?
My 3rd question is, if no pattern is found in that file? Should it be Meta.ParseError
? But according to its docs,
The expression passed to the
parse
function could not be interpreted as a valid Julia expression.
str
is from an another software that has nothing to do with Julia. Or I should define a custom type ParseFailure
?
struct ParseFailure <: Exception
msg::String
end
Which is more idiomatic?
I would be grateful for your help!