# DataFrame to multidimensional array

Hey,

Suppose I have a Dataframe that corresponds to a multidimensional function. Something like this:
`df = DataFrame([(a = x, b = y, c = z,d = x+y+z) for x in 1:6 for y in 1:4 for z in 1:2])`

What is the best way to transform it into the following multidimensional array
`[x+y+z for x in 1:6, y in 1:4 , z in 1:2]`

Currently I’m doing it with `groupby` but it gets complicated with higher dimensions (I’m a Matlab user, so I used to work with multidimensional arrays).

Thank you

Are x,y,z integer?

I assume you know the sizes of x, y and z? Inferring those from the vectors first would be a more annoying step.

``````df = DataFrame([(a = x, b = y, c = z,d = x+y+z) for x in 1:6 for y in 1:4 for z in 1:2])

arr = permutedims(reshape(copy(df.d), (2, 4, 6)), (3, 2, 1))
``````

This gives:

``````6×4×2 Array{Int64, 3}:
[:, :, 1] =
3  4   5   6
4  5   6   7
5  6   7   8
6  7   8   9
7  8   9  10
8  9  10  11

[:, :, 2] =
4   5   6   7
5   6   7   8
6   7   8   9
7   8   9  10
8   9  10  11
9  10  11  12

julia> arr == [x+y+z for x in 1:6, y in 1:4 , z in 1:2]
true
``````

Note that `for x in 1:6 for y in 1:4 for z in 1:2` has exactly the opposite order of dimensions than `for x in 1:6, y in 1:4, z in 1:2` which is why the `permutedims` is needed.

2 Likes

Here is an alternative using TensorCast.jl:

``````using TensorCast
@cast v[i,j,k] := copy(df.d)[k⊗j⊗i] (i ∈ 1:6, j ∈ 1:4, k ∈ 1:2)
``````
2 Likes

Depending on how you obtain the data in the first place, you may refactor that process to return a multidimensional array instead of a dataframe. Arrays are indeed easy and convenient to use in julia, and they are more general.

But for this particular operation, there’s a nice table → multi dim array conversion function in `AxisKeys.jl`:

``````julia> using AxisKeys

julia> wrapdims(df, :d, :a, :b, :c)
3-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   a ∈ 6-element Vector{Int64}
→   b ∈ 4-element Vector{Int64}
◪   c ∈ 2-element Vector{Int64}
And data, 6×4×2 Array{Int64, 3}:
[:, :, 1] ~ (:, :, 1):
(1)  (2)  (3)  (4)
(1)    3    4    5    6
(2)    4    5    6    7
(3)    5    6    7    8
(4)    6    7    8    9
(5)    7    8    9   10
(6)    8    9   10   11

[:, :, 2] ~ (:, :, 2):
(1)  (2)  (3)  (4)
(1)    4    5    6    7
(2)    5    6    7    8
(3)    6    7    8    9
(4)    7    8    9   10
(5)    8    9   10   11
(6)    9   10   11   12
``````

It would even work with non-consecutive or non-numeric `x`, `y`, `z` values.

1 Like

The performance of `wrapdims()` on this specific example seems to be way subpar.