Is there a DictTables.jl?

Amin_Yahyaabadi · July 8, 2020, 1:36am

I have a problem in which the data is tabular, but one of the columns is the keys for the rest of the row. For example:

│ id     │ x2    │ y2    │
│ String │ Int64 │ Int64 │
┼────────┼───────┼───────┤
│ a      │ 1     │ 4     │
│ b      │ 2     │ 5     │
│ k      │ 3     │ 6     │
│ w      │ 3     │ 5     │
│ c      │ 4     │ 5     │

This data can be represented best as a Dict of Dicts. For example, the first row would be:

"a" => Dict(:x2 => 1 , :y2 => 4)

However, since the data is tabular, having the Tables.jl will help in many situations. I was wondering if the Tables’s interface can be extended to support the concept of keys.
It should have an iterator similar to Dict:

for (key, rest_of_the_row) in dicttable
    # iteration over kesys
end

The problems I can solve with this interface naturally:

# In `DictTable`, I specify the keys column in the first argument

# Suppose I want to merge `data` into `old_data` such that:
# - its `id`s are already in `old_data`, and
# - the output table includes `x1,x2`, and
# - a new column called `z` which for each row, its value is the `y2+y1`
data = DictTable( :id,  :id => ["a","b","k","w","c"], :x2=>[1,2,3,3,4], :y2 => [4,5,6,5,5])

old_data =  DictTable( :id, :id => ["c","b","a"], :x1=>[0,1,2], :y1 => [4,5,6])

Tamas_Papp · July 8, 2020, 4:53am

This is a valid, but somewhat special use case which may not justify changing the Tables API. I see two solutions:

a manual conversion to a Dict,
a type implementing DictTable, which supports the Tables.jl interface and AbstractDict (there is no conflict that is apparent to me, but maybe I am missing something).

Amin_Yahyaabadi · July 8, 2020, 5:21am

Yes. Here by extending the Tables.jl I meant the case that we can extend it in a backward-compatible manner. Doing it this way will allow all of the already defined types to have this new feature/iterator.

If adding this breaks the API in any ways, then we shall consider a new DictTables.jl package. If want to define the type as a subtype of AbstractDict then probably we will need DictTables.jl.

xiaodai · July 8, 2020, 6:06am

You can kinda already do that with

using DataFrames

df = DataFrame(a = ["a", "b","c"], i = 1:3, j = 4:6)


dfg = groupby(df, :a)



for (key, group) in zip(keys(dfg), dfg)
    println(key)
    println(group)
end

the only thing missing is direct indexability. If you are just iterating through, the above is fine.

pdeffebach · July 8, 2020, 12:06pm

You can index a grouped data frame

julia> dfg[tuple("a")]
1×3 SubDataFrame
│ Row │ a      │ i     │ j     │
│     │ String │ Int64 │ Int64 │
├─────┼────────┼───────┼───────┤
│ 1   │ a      │ 1     │ 4     │

It will return a subdataframe and not a data frame row, but that’s still easy to work with.

xiaodai · July 8, 2020, 12:37pm

Then there is really no need for DictTable. Everything works. But is the index “indexed” like is it fast like in O(1) for dictionaries?

quinnj · July 8, 2020, 1:14pm

Why is the best representation a Dict of Dicts? If the data is immutable, it could be a NamedTuple of NamedTuples. Or just a custom struct, something like:

struct KeyedRow{K, V} <: AbstractDict{K, V}
    key::K
    row::V
end

Something like this is maybe possible, but nothing has been formally proposed. We tend to try and keep the API surface area as simple and small as possible, but if there’s enough momentum, we could maybe figure something out.

Note that the StructTypes.jl package defines the StructTypes.idproperty, which allows, for a custom struct, to define what the “key” field is. This is used, for example, in the Strapping.jl package to identify unique rows when building custom structs from (Tables.jl-compatible) resultsets.

So using the KeyedRow example from before, we’d tweak the definitions like:

struct KeyedRow{V} <: Tables.AbstractRow
    key::Symbol
    row::V
end

# Tables.jl interface for row
Tables.columnnames(x::KeyedRow) = Tables.columnnames(getfield(x, :row))
Tables.getcolumn(x::KeyedRow, i::Int) = Tables.getcolumn(getfield(x, :row), i)
Tables.getcolumn(x::KeyedRow, nm::Symbol) = Tables.getcolumn(getfield(x, :row), nm)

# StructTypes.jl interface
StructTypes.StructType(::Type{<:KeyedRow}) = StructTypes.Struct()
StructTypes.idproperty(::Type{<:KeyedRow}) = :key

With this, you could “wrap” any valid Tables.jl row in the KeyedRow struct and provide the key property. You could then use the normal Tables.jl interface (Tables.rows, etc.) and a KeyedRow would act just like a normal row it wraps, except you could also call StructTypes.idproperty

nalimilan · July 8, 2020, 1:50pm

Yes, it is since 0.21 thanks to @bkamins.

Topic		Replies	Views
Vector{Dict} to Tables.jl table General Usage tables	0	425	August 9, 2021
About DataFrame(array_of_dict) Data dataframes	5	550	July 25, 2021
Data structure for convenient access to tabular data General Usage dataframes	5	389	February 13, 2023
Convert dictionary to Tables.table New to Julia question	4	1748	March 15, 2019
Dash datatable using a nested dict anyone done it? New to Julia dataframes , dash	7	1294	June 15, 2022

Is there a DictTables.jl?

Related topics