When doing research, I like setting up experiments in Julia to verify my theoretical findings or to simply make sense of the concepts I’m dealing with. The point is, my experiments are usually simple, small and I’d like to set them up and get data quickly.
Now, my problem is that eventually as my experiments grow in complexity it becomes cumbersome to store and handle the data generated by these simulations in a systematic way. I couldn’t find any information on how to store data in a “proper” way on here or other Julia forums.
As an example, I have a function run_test(params...)
that takes around 6 parameters and outputs a vector of variable length specified by one of the parameters (the entries in the vector are 0, 1 or Nothing if it makes a difference). Say I want to run this test 1000 times for each combination of 2 parameters that I’m varying (while keeping other parameters fixed).
Should I create a DataFrame in which each row of data represents one run and contains the value of all parameters? Should my output vector be stored in a single cell of the DataFrame? Is it better to define a lot of columns to store each entry of the vector (provided I know how big it will ever get)?
Or should I store each 1000 runs as a vector of vectors and put in a Dict with the key containing the setup? I feel this is more efficient but harder to manipulate.
In general, is it worth it creating a struct that will hold the parameters or the output? Is it a good idea to store structs in a DataFrame or other data structures?
What if I want to keep the log of all past experiments? Should I load/save csv’s with DataFrames? JLD2 the dicts?
I hope these questions give an idea of what I’m struggling is. I feel like there must be good practices of working with data and I’d be very grateful for any pointers in that direction. Perhaps this is a more general question but I wonder if there is any advice that is specific to Julia.
Thanks a lot in advance!
Stas