I have a text file that has the following structure
begin"head"
<head1 content>
end"head"
begin"body"
<body1 content>
end"body"
begin"foot"
<foot1 content>
end"foot"
begin"head"
<head2 content>
end"head"
begin"body"
<body2 content>
end"body"
begin"foot"
<foot2 content>
end"foot"
...
Such files are collections of multiple datasets, where one dataset is composed of one head
, body
and foot
.
Obviously, such files should be handled in a concurrent manner, each dataset independently.
Is there a way to tell CSV
to split the file in chunks at all the begin"head"
?
Not sure why CSV.jl should read such type of file but, why not considering writing your own file parser?
Agree, this file looks very non-CSV in format. Unclear if data is even tabular.
2 Likes
@rafael.guerra @StefanKarpinski, yeah sorry, I hid the only part of the file that looks like a CSV.
Basically, anything inside a begin ... end
is composed of items separated with "
(yes somebody thought of that character for a separator…).
My idea was to use CSV
with the transposition option and get rid of all the begin...end
. But as each line has a different amount of information (there is no consistency in columns counts) it may not be suited for such files indeed.
I just figured CSV
was well developed now and might be of help for this.
I think this is different enough from CSV that trying to get a CSV reader to process it will be much more pain than help. I would write a loop that processes the file format line by line using regular expressions and split to extract the data.
1 Like
All right, thanks for the advice !