Natural column order for DataFrames

Happy New Year! :fireworks::confetti_ball::tada:

I have a Dict{Symbol,Vector} and want to create a DataFrame out of it. It’s easy enough to do DataFrame(dict) but I also want the columns to appear in the right order. For example,

julia> x = Dict(:data    => Dict(:foo => [1,2,3], :bar => [4,5,6], :mouse => [7,8,9]),
                :symbols => [:foo, :bar, :mouse])
Dict{Symbol,Any} with 2 entries:
  :symbols => Symbol[:foo, :bar, :mouse]
  :data    => Dict(:mouse=>[7, 8, 9],:bar=>[4, 5, 6],:foo=>[1, 2, 3])

julia> df = DataFrame(x[:data])
3Γ—3 DataFrames.DataFrame
β”‚ Row β”‚ bar β”‚ foo β”‚ mouse β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 4   β”‚ 1   β”‚ 7     β”‚
β”‚ 2   β”‚ 5   β”‚ 2   β”‚ 8     β”‚
β”‚ 3   β”‚ 6   β”‚ 3   β”‚ 9     β”‚

Since I have the proper order in x[:symbols], I could rearrange it as below. It’s a bit ugly so I’m wondering if there’s a better way… Perhaps DataFrames.jl should have another constructor.

julia> df = DataFrame([x[:data][v] for v in x[:symbols]], x[:symbols])
3Γ—3 DataFrames.DataFrame
β”‚ Row β”‚ foo β”‚ bar β”‚ mouse β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 1   β”‚ 4   β”‚ 7     β”‚
β”‚ 2   β”‚ 2   β”‚ 5   β”‚ 8     β”‚
β”‚ 3   β”‚ 3   β”‚ 6   β”‚ 9     β”‚

I think that the Pair{Symbol,Vector}... constructor should do what you want, ie

using DataFrames
cols = Dict(:foo => [1,2,3], :bar => [4,5,6], :mouse => [7,8,9])
order = [:foo, :bar, :mouse]
DataFrame([name => cols[name] for name in order]...)

otherwise the DataFrames(columns, cnames) constructor you are using is fine.

Thanks @Tamas_Papp . I haven’t thought about that one.

I still hope that DataFrame would have another constructor that just takes a Dict and an array of keys of the specified order. Perhaps I’ll submit a PR there :slight_smile:

Ideally (as in your example):

DataFrame(cols, order)

When conversions are relatively easy, I don’t think it is a good idea to provide constructors for cases like this (there are many similar ones).

All the constructor would do is apply the same one-liners as above, with the cost of increasing code complexity and maintenance burden for the package.

Agreed, especially that in this case it can be written in an even simpler form:

DataFrame((v => x[:data][v] for v in x[:symbols])...)

Looks like what is really needed here is an ordered dictionary type which would preserve the order in which you want the keys to appear. Adding DataFrame constructors isn’t really the appropriate solution.

1 Like

@Tamas_Papp and @bkamins, I’m not picking on you guys but I humbly disagree. As a package writer, I believe the interface needs to be as user friendly as possible. Am I the only one who has this need? If everyone like me is writing the same one-liner then it would make more sense put that one line in the package.

I like @nalimilan’s suggestion, however. If we have have a keys() function that takes an OrderedDict object and returns the natural order of the keys, then we can bypass the auto-sorting feature in DataFrames. Hence, we don’t need to burden DataFrames package with the additional constructor.

For reference, DataFrames.jl’s code look like this: :

function Base.convert(::Type{DataFrame}, d::Associative)
    colnames = keys(d)
    if isa(d, Dict)
        colnames = sort!(collect(keys(d)))
    else
        colnames = keys(d)
    end
    colindex = Index(Symbol[k for k in colnames])
    columns = Any[d[c] for c in colnames]
    DataFrame(columns, colindex)
end
1 Like

Precisely:

using DataStructures
cols = OrderedDict(:foo => [1,2,3], :bar => [4,5,6], :mouse => [7,8,9])
DataFrame(cols)

does the right thing, since it goes through the second branch.

2 Likes

I agree that in cases like unambigous conversions such as

DataFrame(dict::OrderedDict)

it would be nice to provide constructors (although it does raise an dependency issue).

However, as a DataFrames user I mostly agree with @Tamas_Papp and @bkamins. I don’t see a need to try to guess every possible use of a constructor when a simple and efficient bit of code like a list comprehension would suffice. After all, one of the primary reasons to use a language like Julia as opposed to a language like C++ is that there’s a lot of code which it’s very simple to write so we don’t require anywhere near as many specialized functions.

The point is, we don’t need any special constructors. We can just rely on the fact that keys(::OrderedDict) returns keys in the right order.

4 Likes