I have a text file with multiple DataFrames in the following format (tab separated):
%F col1 col2 col3
%R 1 2 3
%F col1 col2 col3 col4
%R 1 2 3 4
%R 5 6 7 hello
column names are correlated among tables.
Is there a way to read in these DataFrames such that I can do cross table queries later on?
I realised this in R by storing data into a named list of DataFrames. I don’t know what is the proper data structure and method in Julia to realise this… guidance is appreciated thanks!
I don’t know how you would do this in
CSV.jl without pre-processing the data first. I am guessing you’d have to read it in a more primitive manner and pass the data off to the
DataFrames constructor. You can store all the data frames in an array, i.e. something like
Just curious to know how you did this in R? And did you do it with
fread? I have a similar problem I am trying to solve.
You cannot do this in a single step as @affans said. I did something similar in one of my packages (see this function https://github.com/tbeason/FamaFrenchData.jl/blob/master/src/FamaFrenchData.jl#L122).
I read in the whole file with
readlines, find the chunks with tables, then pass those to CSV.jl. You likely need to do something similar.
yes I readChar the whole file as single string, split by “%T\t” into chunks, then regmatches to get table names in a vector, then fread each chunk by skipping 1st row (which is table name) and set header=TRUE (as 2nd row is table field).
sounds reasonable, how about the data structure? using a Dict with table name as key and DataFrame as value? the goal is to correlate these DataFrames later…sort of like linked tables, this I must be able to access each by their name
I just store the DataFrames in a Vector, but that was just my choice for that case. You could use a Dict for sure. Especially if your files are mostly uniform (mine we not, some had 1 table others had 10).
In the future, just put your data in separate tables of a SQLite database.
it was meant for SQLite database. just I have too many files, and to import them one by one into SQLite kills the purpose.
By the way, if i had them all in a SQLite database, what should I do to connect to the DB and query the tables from Julia?
See docs here: https://juliadatabases.github.io/SQLite.jl/stable/
just keep everything in individual csv files, read the individual csv files one at a time in julia, and push them into the SQLite database… Then run your complicated joins and grab the results!