Convert Array to DataFrame

Mastomaki · November 12, 2022, 2:42pm

Dear Friends,

I would like to convert a multi-dimensional array to DataFrame. For example

a = [0 1;2 3]

Expected result:


4×3 DataFrame
 Row │ index1     index2   value    
     │ Int64    Int64  Int64
─────┼─────────────────
   1 │     1     1    0
   2 │     1     2    1
   3 │     2     1    3
   4 │     2     2    4

Thus each dimension of the initial array forms a column in the DataFrame and the last column is always the value column. For 2-dimensional arrays I can get this format in the following way:

temp = DataFrame(a, :auto)
temp.rowind = [1:size(a,1)]
temp = stack(temp, 1:2)

pdeffebach · November 12, 2022, 5:00pm

Maybe this?

julia> a = [0 1; 2 3];

julia> df = DataFrame(Tables.table(a))
2×2 DataFrame
 Row │ Column1  Column2
     │ Int64    Int64
─────┼──────────────────
   1 │       0        1
   2 │       2        3

Won’t work with more than 2 dimensions though.

rafael.guerra · November 12, 2022, 6:39pm

This produces the expected result:

using DataFrames
a = reshape(1:24, 2, 3, 4)
it = Iterators.product(axes(a)...)
df = rename!(DataFrame(it), "index".*string.(1:ndims(a)))
df.value = vec(a)
sort!(df)

rocco_sprmnt21 · November 12, 2022, 6:46pm

using DataFrames
A=reshape(1:24, 2,3,4)
idx=[Symbol("index$i") for i in 1:ndims(A)]
nts=[(;zip(idx,Tuple(t))...) for t in CartesianIndices(A)]
df=DataFrame(nts)
df.values=A[:]

df

or more succinctly

DataFrame([(;zip([idx;:values],(i.I...,a))...) for (i,a) in zip(CartesianIndices(A),A)])

Mastomaki · November 13, 2022, 11:26am

Thanks. What is the meaning of

(;zip()...))

?

Mastomaki · November 13, 2022, 11:34am

With 2 dimensional array this produces the error:

ERROR: ArgumentError: DataFrame constructor from a Matrix requires passing :auto as a second argument to automatically generate column names: DataFrame(matrix, :auto)

rocco_sprmnt21 · November 13, 2022, 11:47am

try in this way (add [:] after CartesianIndices(A))

DataFrame([(;zip([idx;:values],(i.I...,a))...) for (i,a) in zip(CartesianIndices(A)[:],A)])

rocco_sprmnt21 · November 13, 2022, 11:52am

This is (one of) the syntax for constructing a NamedTuple. That is, a structure that associates n symbols to n values.

In the case in question this is to build a namedtuple vector that the DataFrame constructor can use to give you the dataframe you want

Others, better than me, could illustrate the potential of this syntax and comment on its use in these cases

rocco_sprmnt21 · November 13, 2022, 1:11pm

it still remains to be clarified why in the 3-dimensional case the Cartesian index matrix works without the need to vectorize, while in the 2-dimensional case it does not.

julia> using DataFrames

julia> A=reshape(1:24, 2,3,4)
2×3×4 reshape(::UnitRange{Int64}, 2, 3, 4) with eltype Int64:
[:, :, 1] =
 1  3  5
 2  4  6

[:, :, 2] =
 7   9  11
 8  10  12

[:, :, 3] =
 13  15  17
 14  16  18

[:, :, 4] =
 19  21  23
 20  22  24

julia> idx=[Symbol("index$i") for i in 1:ndims(A)]
3-element Vector{Symbol}:
 :index1
 :index2
 :index3

julia> DataFrame([(;zip([idx;:values],(i.I...,a))...) for (i,a) in zip(CartesianIndices(A),A)])
24×4 DataFrame
 Row │ index1  index2  index3  values 
     │ Int64   Int64   Int64   Int64  
─────┼────────────────────────────────
   1 │      1       1       1       1
   2 │      2       1       1       2
   3 │      1       2       1       3
   4 │      2       2       1       4
   5 │      1       3       1       5
   6 │      2       3       1       6
   7 │      1       1       2       7
   8 │      2       1       2       8
   9 │      1       2       2       9
  10 │      2       2       2      10
  11 │      1       3       2      11
  12 │      2       3       2      12
  13 │      1       1       3      13
  14 │      2       1       3      14
  15 │      1       2       3      15
  16 │      2       2       3      16
  17 │      1       3       3      17
  18 │      2       3       3      18
  19 │      1       1       4      19
  20 │      2       1       4      20
  21 │      1       2       4      21
  22 │      2       2       4      22
  23 │      1       3       4      23
  24 │      2       3       4      24

julia> A=reshape(1:12, 3,4)
3×4 reshape(::UnitRange{Int64}, 3, 4) with eltype Int64:
 1  4  7  10
 2  5  8  11
 3  6  9  12

julia> idx=[Symbol("index$i") for i in 1:ndims(A)]
2-element Vector{Symbol}:
 :index1
 :index2

julia> DataFrame([(;zip([idx;:values],(i.I...,a))...) for (i,a) in zip(CartesianIndices(A),A)])
ERROR: ArgumentError: `DataFrame` constructor from a `Matrix` requires passing :auto as a second argument to automatically generate column names: `DataFrame(matrix, :auto)`
Stacktrace:
 [1] DataFrame(matrix::Matrix{NamedTuple{(:index1, :index2, :values), Tuple{Int64, Int64, Int64}}})
   @ DataFrames C:\Users\sprmn\.julia\packages\DataFrames\hFLqf\src\dataframe\dataframe.jl:381
 [2] top-level scope
   @ c:\Users\sprmn\.julia\v1.8\dataframes24.jl:185

julia> DataFrame([(;zip([idx;:values],(i.I...,a))...) for (i,a) in zip(CartesianIndices(A)[:],A)])
12×3 DataFrame
 Row │ index1  index2  values 
     │ Int64   Int64   Int64
─────┼────────────────────────
   1 │      1       1       1
   2 │      2       1       2
   3 │      3       1       3
   4 │      1       2       4
   5 │      2       2       5
   6 │      3       2       6
   7 │      1       3       7
   8 │      2       3       8
   9 │      3       3       9
  10 │      1       4      10
  11 │      2       4      11
  12 │      3       4      12

Dan · November 13, 2022, 1:56pm

Another version of this method can fit into one-line:

(df = DataFrame(collect.(zip(Tuple.(keys(A))...)),:auto)).val = vec(A);

Here is an example:

julia> A=reshape(1:12, 3,4)
3×4 reshape(::UnitRange{Int64}, 3, 4) with eltype Int64:
 1  4  7  10
 2  5  8  11
 3  6  9  12

julia> (df = DataFrame(collect.(zip(Tuple.(keys(A))...)),:auto)).val = vec(A);

julia> df
12×3 DataFrame
 Row │ x1     x2     val   
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1      1
   2 │     2      1      2
   3 │     3      1      3
   4 │     1      2      4
   5 │     2      2      5
   6 │     3      2      6
   7 │     1      3      7
   8 │     2      3      8
   9 │     3      3      9
  10 │     1      4     10
  11 │     2      4     11
  12 │     3      4     12

columns names a bit off, but close enough perhaps.

Mastomaki · November 13, 2022, 2:20pm

I’m not sure but maybe this provides the answer:

https://www.juliabloggers.com/working-with-matrices-in-dataframes-jl-1-0/

In other words, construction of DataFrame from Matrix as single argument is not allowed.

rocco_sprmnt21 · November 13, 2022, 2:37pm

The inconsistency, in my view, is about a different aspect.
The following expression
[(; zip ([idx;: values], (i.I ..., a)) ...) for (i, a) in zip (CartesianIndices (A), A)]

produces an Array {T, 3} in case A = reshape (1:24, 3,2,4)

and a Matrix {T} in the case A = reshape (1:12, 3,4).
Being, ultimately, Matrix {T} == Array {T, 2}, I don’t understand why the DataFrame constructor provides the expected result for the case nsdims = 3 and not for the case ndims = 2.

T is NamedTuple

For the sake of clarity, I would expect it to not work in either case, without explicit vectorization.

Topic		Replies	Views
From DataFrame to multidimensional Array Data array , dataframes	9	2701	June 21, 2021
DataFrame to multidimensional array New to Julia dataframes	6	1243	February 26, 2024
Can DataFrames.jl handle multi-dimensional arrays General Usage question	6	4201	November 6, 2019
Named Array from DataFrame? General Usage array , arrays	0	467	September 17, 2021
DataFrame from array of arrays General Usage array , dataframes	4	5117	December 24, 2018

Convert Array to DataFrame

Related topics