Parsing a bespoke file format

BambOoxX · September 20, 2021, 1:41pm

I have a text file that has the following structure

begin"head"
    <head1 content>
end"head"
begin"body"
    <body1 content>
end"body"
begin"foot"
    <foot1 content>
end"foot"
begin"head"
    <head2 content>
end"head"
begin"body"
    <body2 content>
end"body"
begin"foot"
    <foot2 content>
end"foot"
...

Such files are collections of multiple datasets, where one dataset is composed of one head, body and foot.
Obviously, such files should be handled in a concurrent manner, each dataset independently.

Is there a way to tell CSV to split the file in chunks at all the begin"head" ?

rafael.guerra · September 20, 2021, 7:18pm

Not sure why CSV.jl should read such type of file but, why not considering writing your own file parser?

StefanKarpinski · September 20, 2021, 8:39pm

Agree, this file looks very non-CSV in format. Unclear if data is even tabular.

BambOoxX · September 21, 2021, 7:34am

@rafael.guerra @StefanKarpinski, yeah sorry, I hid the only part of the file that looks like a CSV.
Basically, anything inside a begin ... end is composed of items separated with " (yes somebody thought of that character for a separator…).

My idea was to use CSV with the transposition option and get rid of all the begin...end. But as each line has a different amount of information (there is no consistency in columns counts) it may not be suited for such files indeed.

I just figured CSV was well developed now and might be of help for this.

StefanKarpinski · September 21, 2021, 1:23pm

I think this is different enough from CSV that trying to get a CSV reader to process it will be much more pain than help. I would write a loop that processes the file format line by line using regular expressions and split to extract the data.

BambOoxX · September 21, 2021, 1:36pm

All right, thanks for the advice !

Topic		Replies	Views
Reading headers of delimited files General Usage csv	1	642	January 3, 2023
Parsing strange CSV files General Usage question , csv	5	654	March 13, 2020
Best way to parse custom formatted text file General Usage question	3	1054	April 10, 2020
Reading a non-uniform CSV file Data csv	1	675	November 10, 2020
Loading a csv-like file with a multi-line header General Usage question	11	445	May 20, 2024

Parsing a bespoke file format

Related topics