How to parse text between specific characters in Julia?

Hello,
I would like to create a text files (a plain text txt or a tab-separated file tsv) with modular information. I have been thinking of including the information within some markers, for instance *= and =* so that the text within these markers works as a unit and multiple units can be placed in the file. Somenting like this:

*=
NAME_Abel
DOB_03-09-1960
*=
*=
NAME_Bernard
DOB_12-12-1972
*=

Is it possible to load these chunks of text one group at the time in Julia? The idea is to read the file, parse in the first group *= ... =*, then parse the other and so forth.
Thank you

How about using XML for your task?
Example:

julia> using LightXML

julia> xdoc = parse_file("xml_example.txt");

julia> xroot = root(xdoc);

julia> for ele in get_elements_by_tagname(xroot, "group")
       println(content(ele))
       end

NAME_Abel
DOB_03-09-1960


NAME_Bernard
DOB_12-12-1972

The file xml_example.txt looks like:

<?xml version="1.0" encoding="UTF-8"?>
<groups>
<group>
NAME_Abel
DOB_03-09-1960
</group>
<group>
NAME_Bernard
DOB_12-12-1972
</group>
</groups>
1 Like

Although you could definitely write a parser in Julia for any format you want, like @oheil I would encourage you to use a standard format, via one of Julia’s many existing parsers, rather than writing your own.

For structured data in a human-readable text-based format, a common choice is JSON format (using JSON.jl). XML is also an option, but in my opinion that’s overly complicated for relatively simple data structures.

There are also binary formats, like the BSON analogue of JSON (via BSON.jl), Protocol Buffers (using ProtoBuf.jl), the HDF5 format (via HDF5.jl) which is more oriented towards large numeric arrays, and many others, and JLD format (which is Julia-oriented wrapper around HDF5, via JLD.jl or JDL2.jl), among many others.

2 Likes

Thank you. I’ll look into them.