Strange DataFrame results

Hello

I have a strange issue. I have many places where I’m first creating an array of Dictionaries and then converting it to a DataFrame at the end. This works fine and is fast enough for my purposes. In most cases, it works perfectly well, but recently, I found one case where the dataFrame contents weren’t what I was expecting.

Instead of the names of the DataFrame being the keys of the dictionaries, they were always generated like the following

names(df)
8-element Array{Symbol,1}:
 :slots
 :keys
 :vals
 :ndel
 :count
 :age
 :idxfloor
 :maxprobe

If I take any one of elements of the Array of Dictionaries and convert it as a single Dictionary into a DataFrame it works OK (I looped through every row and each row by itself comes out OK) but only when I take the entire Array at once, does this occur, but as I noted, I do this in many other places without issue.

I suspect that this is some sort of internal dataframe representation, but I can’t figure out why in this one case, it is being generated like this? Does someone know what this means and what condition could possibly create it?

This is strange. Can you make an MWE? It’s hard to tell exactly what is going on without one.

Things are working fine for me.

julia> d = [Dict(:a => 5, :b => 7), Dict(:a => 100, :b => 89)]
2-element Array{Dict{Symbol,Int64},1}:
 Dict(:a => 5,:b => 7)
 Dict(:a => 100,:b => 89)

julia> DataFrame(d)
2×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 5     │ 7     │
│ 2   │ 100   │ 89    │
1 Like

I’ve tested various configurations and found that this simple case replicates the issue

test = Array(Dict[])
push!(test, Dict("test" => "1"))
push!(test, Dict("test" => "2"))
DataFrame(test)

│ Row │ slots                                                                                            │ keys                                                                                                                             │
│     │ Array{UInt8,1}                                                                                   │ Array{String,1}                                                                                                                  │
├─────┼──────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1   │ [0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] │ [#undef, #undef, #undef, #undef, "test", #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef] │
│ 2   │ [0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] │ [#undef, #undef, #undef, #undef, "test", #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef, #undef] │

if I switch the keys to Symbols, then it works OK

test = Array(Dict[])
push!(test, Dict(:test => "1"))
push!(test, Dict(:test => "2"))
julia> DataFrame(test)
2×1 DataFrame
│ Row │ test   │
│     │ String │
├─────┼────────┤
│ 1   │ 1      │
│ 2   │ 2      │

Is this expected that we cannot use string keys?

The behavior is certainly bizarre and we should throw an error instead of have this behavior.

But ultimately, yes. You should use Symbols.

but it looks like it works OK, if I use a single dictionary with string keys

Is this really not allowed?

julia> d = Dict("test" => "1")
Dict{String,String} with 1 entry:
  "test" => "1"

julia> DataFrame(d)
1×1 DataFrame
│ Row │ test   │
│     │ String │
├─────┼────────┤
│ 1   │ 1      │

DataFrames don’t use Strings as column names, they use Symbols. So you shouldn’t expect using a String to work.

It’s pretty easy to turn a String into a Symbol, via Symbol(s).

The problem (or a feature) is not with DataFrames.jl but with Tables.jl (and DataFrames.jl falls back to Times.jl) and Tables.jl defines the following rules:

julia> d = Dict(:test => "1")
Dict{Symbol,String} with 1 entry:
  :test => "1"

julia> Tables.isrowtable([d, d])
true

julia> d = Dict("test" => "1")
Dict{String,String} with 1 entry:
  "test" => "1"

julia> Tables.isrowtable([d, d])
false
2 Likes