How to parse text between specific characters in Julia?

Luigi_Marongiu · August 28, 2023, 9:45am

Hello,
I would like to create a text files (a plain text txt or a tab-separated file tsv) with modular information. I have been thinking of including the information within some markers, for instance *= and =* so that the text within these markers works as a unit and multiple units can be placed in the file. Somenting like this:

*=
NAME_Abel
DOB_03-09-1960
*=
*=
NAME_Bernard
DOB_12-12-1972
*=

Is it possible to load these chunks of text one group at the time in Julia? The idea is to read the file, parse in the first group *= ... =*, then parse the other and so forth.
Thank you

oheil · August 28, 2023, 10:18am

How about using XML for your task?
Example:

julia> using LightXML

julia> xdoc = parse_file("xml_example.txt");

julia> xroot = root(xdoc);

julia> for ele in get_elements_by_tagname(xroot, "group")
       println(content(ele))
       end

NAME_Abel
DOB_03-09-1960


NAME_Bernard
DOB_12-12-1972

The file xml_example.txt looks like:

<?xml version="1.0" encoding="UTF-8"?>
<groups>
<group>
NAME_Abel
DOB_03-09-1960
</group>
<group>
NAME_Bernard
DOB_12-12-1972
</group>
</groups>

stevengj · August 28, 2023, 12:49pm

Although you could definitely write a parser in Julia for any format you want, like @oheil I would encourage you to use a standard format, via one of Julia’s many existing parsers, rather than writing your own.

For structured data in a human-readable text-based format, a common choice is JSON format (using JSON.jl). XML is also an option, but in my opinion that’s overly complicated for relatively simple data structures.

There are also binary formats, like the BSON analogue of JSON (via BSON.jl), Protocol Buffers (using ProtoBuf.jl), the HDF5 format (via HDF5.jl) which is more oriented towards large numeric arrays, and many others, and JLD format (which is Julia-oriented wrapper around HDF5, via JLD.jl or JDL2.jl), among many others.

Luigi_Marongiu · August 28, 2023, 3:01pm

Thank you. I’ll look into them.

Topic		Replies	Views
Read text file julia: fscanf in Julia: New to Julia question , matlab , io	5	1923	May 7, 2019
How to read/process tuple data file New to Julia file	2	180	March 18, 2024
TextFormats parser generator Biology, Health, and Medicine	5	533	March 5, 2023
Reading values from .txt file New to Julia io	8	6341	October 31, 2022
Reading and interpreting math in text file General Usage	3	396	June 3, 2021

How to parse text between specific characters in Julia?

Related topics