I plan to run a sequence of many (on the order 100) simulations, each of which outputs an array of floating point numbers (on the order of 10k each). I want to store these arrays with metadata about the parameters used in the simulation that produced them.
I know this is a simple question but I just want to ask for a good way to organize this data.
One thought I had was for each simulation to correspond to a line in a DataFrame with columns describing the simulation parameters and an additional column for the actual simulation data. But I am not sure that it is “correct” for one column to be a large array of numbers while the others are simple strings and integers (maybe this is a standard usage, I am really new to DataFrames).
Something else I thought about was to have a “main” dataframe of just metadata stored as one CSV, with the “data” columns corresponding to filenames (one per simulation) where the arrays are stored.
Any help / advice would be appreciated. Thanks so much!
By the way, if it wasn’t clear my objective function is based on aesthetics. I am not dealing with so much data so I am not worried about I/O performance or anything like that. I just want to organize things in such a way that it will be easy for me to go back after a few weeks and easily remember how things are arranged and what the numbers all mean. For reference, my current system is to have output files with strange names and numbers that I can’t parse, so anything better than this is an improvement!