How to create a NamedTuples from dynamic field names (for a df to dict converter)

question
data

#1

I am trying to write a converter from DataFrame to Dict where the key is a NamedTuples, but I am stuck on how to create the NamedTuples when the “names” are not hard coded but stored in an array:

using NamedTuples
dimCols    = [:colour,:shape,:border]
dimValues  = ["blue","square","slim"]
#rowKey = @NT(colour,shape,border) # This works
rowKey = @NT(eval(dimCols...))     # This doesn't
a = rowKey(dimValues...)

I did consider as alternative to create the key one by one with merge, but I remained unsuccessful.


#2
using NamedTuples
dimCols    = [:colour,:shape,:border]
dimValues  = ["blue","square","slim"]

equals(x, y) = Expr(:(=), x, y)
eval(:($NamedTuples.@NT $(equals.(dimCols, dimValues)...)))

Not sure why you would want to do this though… If you’re looking for type-stable access to dataframes you should probably be using DataFramesMeta, Query, or LazyQuery


#3

Thank you, my idea wasn’t related to type-stability but to have a handle way to write multidimensional equations, like:
[V[r,p,t] = V[r,p,t-1] + Growth[r,p,t] for r in regions, p in products, t in years[2:end] ]

I can do it using normal tuples as keys, but having named tuples I thought it would have allowed me to call the variables with the dimensions in whatever order irrespectively to the original one, e.g.

[V[time=t,reg=r,prod=p] = V[r,p,t-1] + Growth[r,p,t] for r in regions, p in products, t in years[2:end] ]

This however doesn’t work as:

using NamedTuples
myTupleDict      = Dict(("aa","bb") => 1, ("aa","cc") =>2)
myNamedTupleDict = Dict(@NT(one = "aa", two = "bb") =>1, @NT(one = "aa", two = "cc") => 2)

myTupleDict[("aa","bb")]              # 1
myNamedTupleDict[("aa","bb")]         # KeyError
myNamedTupleDict[(two="bb",one="aa")] # KeyError

I hence misinterpreted NamedTuples, as I thought that the whole point would have been to get:

@NT(one = "aa", two = "bb") == @NT(two = "bb", one = "aa") ==  @NT("aa", "bb") ==  ("aa", "bb") 

My implementation of the toDict() function is as follow, but it is pretty useless given the above point:

using DataFrames, NamedTuples

df = DataFrame(
  colour = ["green","blue","white","green","green"],
  shape = ["circle", "triangle", "square","square","circle"],
  border = ["dotted", "line", "line", "line", "dotted"],
  area = [1.1, 2.3, 3.1, 4.2, 5.2]
)

function toDict(df, dimCols, valueCol)
    toReturn = Dict()
    equals(x, y) = Expr(:(=), x, y)
    for r in eachrow(df)
        keyValues = []
        for d in dimCols
           push!(keyValues,r[d])
        end
        rowKey = eval(:($NamedTuples.@NT $(equals.(dimCols, keyValues)...)))
        toReturn[rowKey] = r[valueCol]
    end
    return toReturn
end

myDict = toDict(df,[:colour,:shape,:border],:area)

Do you think there is an easy way to obtain the desired behaviour or should I forget NamedTuples for this approach ?


#4

Something like this?

assignments = :(regions = 1), :(products = 2)
macro reorder(assignments...)
    dict = Dict(map(assignments) do assignment
        assignment.args
    end)
    :($CartesianIndex($(dict[:regions]), $(dict[:products]), $(dict[:years])))
end

@reorder products = 1 years = 2 regions = 5