Can DataFrames.jl handle multi-dimensional arrays

00krishna · November 5, 2019, 7:06am

I was wondering if DataFrames.jl can handle multi-dimensional arrays, meaning an array with 3 or more dimensions.

I can run

using DataFrames
DataFrame(rand(3,4)

and that works fine.

But if I try:

DataFrame(rand(3,4,5))

Then I get a rather cryptic error message
ERROR: ArgumentError: 'Array{Float64,3}' iterates 'Float64' values, which don't satisfy the Tables.jl Row-iterator interface

Does anyone know if there is a plan to add multi-dimensional support to the package. I checked the issue list in github but did not see any issues that explicitly mentioned this–but I did not dig too deeply. So I thought I would ask. Thanks.

bkamins · November 5, 2019, 7:57am

Currently DataFrames.jl support only data that can be represented as a two-dimensional object (i.e. a list of vectors). There are no plans currently to change this.

However, note that you can easily store any data type in a cell of a DataFrame, so e.g. you can write the following:

julia> DataFrame([rand(2) for _ in 1:4, _ in 1:3])
4×3 DataFrame
│ Row │ x1                    │ x2                    │ x3                     │
│     │ Array{Float64,1}      │ Array{Float64,1}      │ Array{Float64,1}       │
├─────┼───────────────────────┼───────────────────────┼────────────────────────┤
│ 1   │ [0.382557, 0.0063816] │ [0.367795, 0.301294]  │ [0.083309, 0.465583]   │
│ 2   │ [0.424023, 0.244837]  │ [0.233543, 0.834364]  │ [0.00453236, 0.548186] │
│ 3   │ [0.392719, 0.628895]  │ [0.979969, 0.534259]  │ [0.588646, 0.825887]   │
│ 4   │ [0.980133, 0.495353]  │ [0.291205, 0.0895148] │ [0.535076, 0.982956]   │

where effectively you have a third dimension nested as a cell value of a two dimensional DataFrame.

Mattriks · November 5, 2019, 8:23am

or

X = rand(3,4,5)
df = DataFrame(X=collect(eachslice(X, dims=3)))

mcabbott · November 5, 2019, 8:23am

You may also be looking for things like AxisArrays, which have some similarities but allow any number of dimensions.

See also NamedDims and a small zoo of other recent packages discussed here.

nilshg · November 5, 2019, 8:26am

I work a lot with panel data, for which a “3D” representation often seems natural at first. In this context I always find it interesting that Python’s pandas is named after paneldata, and indeed used to have a Panel object in addition to the standard DataFrame, but this has been deprecated for a while now in favour of multi-indices on a regular DataFrame: pandas.Panel — pandas 0.23.4 documentation

00krishna · November 6, 2019, 8:01pm

@nilshg this multi-indexing idea is interesting, but seems rather confusing. Like I get the basic idea where instead of having a 3-D volume, you can basically flatten the 3-D into a 2-D array but have a multi-index to recover or iterate over that 3-D structure. So if I run a model for 100 runs, and each model has a duration of 10 years. Then I would normally have a volume of [year, parameters, run]. So each run would have a 10 rows by n number of columns, and then the depth dimension would reference the run number. But I could convert this to a multi-index where I have a 2-D array with a “year” and a “run” column. So the (run, year) would be the multi-index.

But is there a good explanation of how to use multi-indices in a Julia dataframe. The python pandas documentation was never very clear about this, as far as I remember. Or I always ran into issues with some implicit conversions that turned things into multi-indices and I had to get them out of the multi-index. So if there is a good explanation of this please pass along the link. I can check out the Julia dataframe docs in the mean time.

oxinabox · November 6, 2019, 8:56pm

But is there a good explanation of how to use multi-indices in a Julia dataframe.

Julia DataFrames don’t have indexes, you can do by and groupby on any of them, and get the same effect.
which actually makes it much clearer than Pandas, but slower potentially if you do the same grouping again and again.
(used to be multindexes were super complex because indexes are like columns but not)

Topic		Replies	Views
Most popular tabular/multidimensional data types in Julia New to Julia data , type , dataframes	18	1339	December 8, 2021
DataFrame to multidimensional array New to Julia dataframes	6	1256	February 26, 2024
2-dimensional DataFrame column Data dataframes	2	510	March 17, 2020
Convert Array to DataFrame General Usage dataframes	11	1605	November 13, 2022
From DataFrame to multidimensional Array Data array , dataframes	9	2704	June 21, 2021

Can DataFrames.jl handle multi-dimensional arrays

Related topics