From DataFrame to multidimensional Array

Say I have a DataFrame with columns named :i, :j, :k that are positive integers, and meant jointly form a unique identifier. df also contains a column named :z that can be of any type that is meant to hold the data for indices :i, :j, :k:

df = DataFrame(rand(1:10, 3, 4))
rename!(df, [:i, :j, :k, :z])

How can I convert that to an array A such that A[i, j, k] holds the data in :z for indices :i = i, :j = j, :k = k?

In principle, I can populate the array A entry by entry with a for-loop, but thought there should be a more elegant way of doing that.

I’m not aware of such a method, but I’d guess that such a method would end up looking like a for loop, so I think the question of elegance is mostly: do you care about the for loop existing at all in the internals or are you looking for an abstraction?

1 Like

I don’t care about the for loop existing behind scenes. Just thought there could be a canned way to do that.

I’m not aware of one, but it might exist if you dig into the DataFrames code. Seems worth making a PR if you write your own solution you’re proud of.

Here is an MWE for a Matrix. It’s not that performant because of the use of eachrow, but it should be easy to see how to make the last step into a fast function.

This MWE also assumes we know the size of the matrix before hand, but this could also be gotten quite easily from calling maximum on each of the index columns.

julia> df = DataFrame(i = Int[], j = Int[])
0×2 DataFrame


julia> t = Base.Iterators.product(1:4, 1:6);

julia> m = Array{Float64}(undef, 4, 6);

julia> for ti in t
       push!(df, ti)
       end

julia> df.val = rand(24);

julia> for row in eachrow(df)
       m[row.i, row.j] = row.val
       end

Just guessing, but might it be possible to (ab?)use a SparseMatrix for this? If this was two dimensions, you could go with the built in sparse matrix, for higher dimensions I found this package which unfortunately seems unmaintained: https://github.com/jw3126/SimpleSparseArrays.jl

This also seems to be relevant:

1 Like

Thank you all. I just did an ugly loop with DataFramesMeta's @where. It was surprisingly fast … or not, it’s Julia :slight_smile:

1 Like