I was wondering if DataFrames.jl can handle multi-dimensional arrays, meaning an array with 3 or more dimensions.
I can run
using DataFrames
DataFrame(rand(3,4)
and that works fine.
But if I try:
DataFrame(rand(3,4,5))
Then I get a rather cryptic error message ERROR: ArgumentError: 'Array{Float64,3}' iterates 'Float64' values, which don't satisfy the Tables.jl Row-iterator interface
Does anyone know if there is a plan to add multi-dimensional support to the package. I checked the issue list in github but did not see any issues that explicitly mentioned thisβbut I did not dig too deeply. So I thought I would ask. Thanks.
Currently DataFrames.jl support only data that can be represented as a two-dimensional object (i.e. a list of vectors). There are no plans currently to change this.
However, note that you can easily store any data type in a cell of a DataFrame, so e.g. you can write the following:
I work a lot with panel data, for which a β3Dβ representation often seems natural at first. In this context I always find it interesting that Pythonβs pandas is named after paneldata, and indeed used to have a Panel object in addition to the standard DataFrame, but this has been deprecated for a while now in favour of multi-indices on a regular DataFrame: pandas.Panel β pandas 0.23.4 documentation
@nilshg this multi-indexing idea is interesting, but seems rather confusing. Like I get the basic idea where instead of having a 3-D volume, you can basically flatten the 3-D into a 2-D array but have a multi-index to recover or iterate over that 3-D structure. So if I run a model for 100 runs, and each model has a duration of 10 years. Then I would normally have a volume of [year, parameters, run]. So each run would have a 10 rows by n number of columns, and then the depth dimension would reference the run number. But I could convert this to a multi-index where I have a 2-D array with a βyearβ and a βrunβ column. So the (run, year) would be the multi-index.
But is there a good explanation of how to use multi-indices in a Julia dataframe. The python pandas documentation was never very clear about this, as far as I remember. Or I always ran into issues with some implicit conversions that turned things into multi-indices and I had to get them out of the multi-index. So if there is a good explanation of this please pass along the link. I can check out the Julia dataframe docs in the mean time.
But is there a good explanation of how to use multi-indices in a Julia dataframe.
Julia DataFrames donβt have indexes, you can do by and groupby on any of them, and get the same effect.
which actually makes it much clearer than Pandas, but slower potentially if you do the same grouping again and again.
(used to be multindexes were super complex because indexes are like columns but not)