I am writing a simulation that captures a lot of statistics from a mathematical model. Hence I have an array that is long (like 100,000) rows, but also very wide, like 100 columns. I am actually moving the simulation from Python to Julia because of slow performance in Python.
I am still new to Julia, so I don’t know the in’s and out’s of the different array libraries. One thing is that I would like the ability to index or slice the array by column name. That is just because with so many columns it is easy to make errors with column indexes.
The other thing that I have to do is run multiple simulations which I stack into a volume, so I usually have arrays on the order of 100,000 x 100 x 1000, where the final dimension is number of simulations. The final step is computing summaries over the volume, such as means and standard deviations.
I saw that both Named Arrays and Dataframes support named indexing. I was just trying to figure out what the performance differences and potential issues might be from choosing one package versus the other. Dataframes.jl seems to be popular and has a very familiar named indexing interface. But I was not sure if that library was meant for lots of “writes” to the array. Named Arrays also seem good, but I was not sure how actively developed this project is, since I have not seen many posts on the discourse forum about it lately.
I also could not find any info on performance comparisons between Names Arrays versus Dataframes, especially in the context of wide arrays. I did find a post about performance on Dataframes.jl, but it seems like the DataframesMeta.jl package might have improved performance on dataframes compared to before.
Since the reason I am migrating from Python to Julia was due to slow performance, I was hoping someone could set me on the right track in terms of efficiency in Julia for my data structure. Thanks.