Suppose I have an output from a software. It is a pure text without any specific syntax. I need to write regular expressions to capture the information I want. For example, if I want to capture the following pattern:
XXXX
X 0.0 0.0 0.0
Y 0.0 0.0 0.0
Z 0.0 0.0 0.0
I write the following type
struct XXXX{A<:AbstractMatrix}
data::A
end
Should I extend Base.parse as
function Base.parse(::Type{T}, str::AbstractString) where {S,T<:XXXX{S}}
# find the pattern using regular expressions
# parse the data
end
So here comes my 1st question, since XXXX has a type parameter A, which is also a container type (AbstractMatrix), should I respect the users’ choice of S? That is if they do
parse(XXXX, str) # Return `XXXX{Matrix}` of whatever element type (depend on the content of the `str`)
parse(XXXX{SMatrix}, str) # Return `XXXX{SMatrix}` of whatever element type (depend on the content of the `str`)
parse(XXXX{SMatrix{Float64}}, str) # Return `XXXX{SMatrix{Float64}}` even if the elements in `str` are all integers
then I should return what I said in the comments?
As I observed in Base.parse of Complex, they almost do what I said:
julia> parse(Complex, "1 + 1im") |> typeof
ERROR: MethodError: no method matching tryparse_internal(::Type{Complex}, ::String, ::Int64, ::Int64, ::Bool)
Closest candidates are:
tryparse_internal(::Type{Bool}, ::Union{SubString{String}, String}, ::Int64, ::Int64, ::Integer, ::Bool) at parse.jl:178
tryparse_internal(::Type{BigInt}, ::AbstractString, ::Int64, ::Int64, ::Integer, ::Bool) at gmp.jl:261
tryparse_internal(::Type{T<:Integer}, ::AbstractString, ::Int64, ::Int64, ::Bool) where T<:Integer at parse.jl:375
...
Stacktrace:
[1] parse(::Type{Complex}, ::String) at ./parse.jl:380
[2] top-level scope at REPL[4]:100
julia> parse(Complex{Float64}, "1 + 1im") |> typeof
Complex{Float64}
julia> parse(Complex{Int32}, "1 + 1im") |> typeof
Complex{Int32}
My 2nd question is, what if there are several matched patterns in that file? I tried some methods of Base.parse, none of them seem to accept multiple patterns in the str:
julia> parse(Complex{Int32}, "1 + 1im 2+2im")
ERROR: ArgumentError: invalid base 10 digit 'i' in " 1im 2+2"
Stacktrace:
[1] tryparse_internal(::Type{Int32}, ::String, ::Int64, ::Int64, ::Int64, ::Bool) at ./parse.jl:132
[2] tryparse_internal at ./parse.jl:375 [inlined]
[3] tryparse_internal(::Type{Complex{Int32}}, ::String, ::Int64, ::Int64, ::Bool) at ./parse.jl:345
[4] parse(::Type{Complex{Int32}}, ::String) at ./parse.jl:380
[5] top-level scope at REPL[6]:1
So I guess I can only parse one XXXX at a time? How should I parse all of them in a file? Should I let users do
parse.(XXXX, collect(eachmatch(REGEX_OF_XXXX, str)[1]))
or write a function
parse_xxxx(str) = parse.(XXXX, collect(eachmatch(REGEX_OF_XXXX, str)[1]))
Should I call it parse_xxxx or read_xxxx?
My 3rd question is, if no pattern is found in that file? Should it be Meta.ParseError? But according to its docs,
The expression passed to the
parsefunction could not be interpreted as a valid Julia expression.
str is from an another software that has nothing to do with Julia. Or I should define a custom type ParseFailure?
struct ParseFailure <: Exception
msg::String
end
Which is more idiomatic?
I would be grateful for your help!