Is there a way to read in these DataFrames such that I can do cross table queries later on?
I realised this in R by storing data into a named list of DataFrames. I don’t know what is the proper data structure and method in Julia to realise this… guidance is appreciated thanks!
I don’t know how you would do this in DataFrames or CSV.jl without pre-processing the data first. I am guessing you’d have to read it in a more primitive manner and pass the data off to the DataFrames constructor. You can store all the data frames in an array, i.e. something like Array{DataFrame, 1}().
Just curious to know how you did this in R? And did you do it with fread? I have a similar problem I am trying to solve.
yes I readChar the whole file as single string, split by “%T\t” into chunks, then regmatches to get table names in a vector, then fread each chunk by skipping 1st row (which is table name) and set header=TRUE (as 2nd row is table field).
sounds reasonable, how about the data structure? using a Dict with table name as key and DataFrame as value? the goal is to correlate these DataFrames later…sort of like linked tables, this I must be able to access each by their name
I just store the DataFrames in a Vector, but that was just my choice for that case. You could use a Dict for sure. Especially if your files are mostly uniform (mine we not, some had 1 table others had 10).
it was meant for SQLite database. just I have too many files, and to import them one by one into SQLite kills the purpose.
By the way, if i had them all in a SQLite database, what should I do to connect to the DB and query the tables from Julia?
just keep everything in individual csv files, read the individual csv files one at a time in julia, and push them into the SQLite database… Then run your complicated joins and grab the results!