Read multiple DataFrames stored in a single text file

I have a text file with multiple DataFrames in the following format (tab separated):

%T table1
%F col1 col2 col3
%R 1 2 3
%T table2
%F col1 col2 col3 col4
%R 1 2 3 4
%R 5 6 7 hello

column names are correlated among tables.

Is there a way to read in these DataFrames such that I can do cross table queries later on?

I realised this in R by storing data into a named list of DataFrames. I don’t know what is the proper data structure and method in Julia to realise this… guidance is appreciated thanks!

I don’t know how you would do this in DataFrames or CSV.jl without pre-processing the data first. I am guessing you’d have to read it in a more primitive manner and pass the data off to the DataFrames constructor. You can store all the data frames in an array, i.e. something like Array{DataFrame, 1}().

Just curious to know how you did this in R? And did you do it with fread? I have a similar problem I am trying to solve.

1 Like

You cannot do this in a single step as @affans said. I did something similar in one of my packages (see this function https://github.com/tbeason/FamaFrenchData.jl/blob/master/src/FamaFrenchData.jl#L122).

I read in the whole file with readlines, find the chunks with tables, then pass those to CSV.jl. You likely need to do something similar.

1 Like

yes I readChar the whole file as single string, split by “%T\t” into chunks, then regmatches to get table names in a vector, then fread each chunk by skipping 1st row (which is table name) and set header=TRUE (as 2nd row is table field).

sounds reasonable, how about the data structure? using a Dict with table name as key and DataFrame as value? the goal is to correlate these DataFrames later…sort of like linked tables, this I must be able to access each by their name

I just store the DataFrames in a Vector, but that was just my choice for that case. You could use a Dict for sure. Especially if your files are mostly uniform (mine we not, some had 1 table others had 10).

In the future, just put your data in separate tables of a SQLite database.

it was meant for SQLite database. just I have too many files, and to import them one by one into SQLite kills the purpose.
By the way, if i had them all in a SQLite database, what should I do to connect to the DB and query the tables from Julia?

using SQLite

See docs here: Home · SQLite.jl

just keep everything in individual csv files, read the individual csv files one at a time in julia, and push them into the SQLite database… Then run your complicated joins and grab the results!

1 Like