How to create a NamedTuples from dynamic field names (for a df to dict converter)

sylvaticus · August 13, 2017, 3:51pm

I am trying to write a converter from DataFrame to Dict where the key is a NamedTuples, but I am stuck on how to create the NamedTuples when the “names” are not hard coded but stored in an array:

using NamedTuples
dimCols    = [:colour,:shape,:border]
dimValues  = ["blue","square","slim"]
#rowKey = @NT(colour,shape,border) # This works
rowKey = @NT(eval(dimCols...))     # This doesn't
a = rowKey(dimValues...)

I did consider as alternative to create the key one by one with merge, but I remained unsuccessful.

bramtayl · August 13, 2017, 4:32pm

using NamedTuples
dimCols    = [:colour,:shape,:border]
dimValues  = ["blue","square","slim"]

equals(x, y) = Expr(:(=), x, y)
eval(:($NamedTuples.@NT $(equals.(dimCols, dimValues)...)))

Not sure why you would want to do this though… If you’re looking for type-stable access to dataframes you should probably be using DataFramesMeta, Query, or LazyQuery

sylvaticus · August 15, 2017, 9:39am

Thank you, my idea wasn’t related to type-stability but to have a handle way to write multidimensional equations, like:
[V[r,p,t] = V[r,p,t-1] + Growth[r,p,t] for r in regions, p in products, t in years[2:end] ]

I can do it using normal tuples as keys, but having named tuples I thought it would have allowed me to call the variables with the dimensions in whatever order irrespectively to the original one, e.g.

[V[time=t,reg=r,prod=p] = V[r,p,t-1] + Growth[r,p,t] for r in regions, p in products, t in years[2:end] ]

This however doesn’t work as:

using NamedTuples
myTupleDict      = Dict(("aa","bb") => 1, ("aa","cc") =>2)
myNamedTupleDict = Dict(@NT(one = "aa", two = "bb") =>1, @NT(one = "aa", two = "cc") => 2)

myTupleDict[("aa","bb")]              # 1
myNamedTupleDict[("aa","bb")]         # KeyError
myNamedTupleDict[(two="bb",one="aa")] # KeyError

I hence misinterpreted NamedTuples, as I thought that the whole point would have been to get:

@NT(one = "aa", two = "bb") == @NT(two = "bb", one = "aa") ==  @NT("aa", "bb") ==  ("aa", "bb")

My implementation of the toDict() function is as follow, but it is pretty useless given the above point:

using DataFrames, NamedTuples

df = DataFrame(
  colour = ["green","blue","white","green","green"],
  shape = ["circle", "triangle", "square","square","circle"],
  border = ["dotted", "line", "line", "line", "dotted"],
  area = [1.1, 2.3, 3.1, 4.2, 5.2]
)

function toDict(df, dimCols, valueCol)
    toReturn = Dict()
    equals(x, y) = Expr(:(=), x, y)
    for r in eachrow(df)
        keyValues = []
        for d in dimCols
           push!(keyValues,r[d])
        end
        rowKey = eval(:($NamedTuples.@NT $(equals.(dimCols, keyValues)...)))
        toReturn[rowKey] = r[valueCol]
    end
    return toReturn
end

myDict = toDict(df,[:colour,:shape,:border],:area)

Do you think there is an easy way to obtain the desired behaviour or should I forget NamedTuples for this approach ?

bramtayl · August 15, 2017, 1:59pm

Something like this?

assignments = :(regions = 1), :(products = 2)
macro reorder(assignments...)
    dict = Dict(map(assignments) do assignment
        assignment.args
    end)
    :($CartesianIndex($(dict[:regions]), $(dict[:products]), $(dict[:years])))
end

@reorder products = 1 years = 2 regions = 5

Topic		Replies	Views
How to create `DataFrame` from using NamedTuple keys as column names Data	4	2623	August 11, 2019
Custom NamedTuple dynamic creation General Usage namedtuple	5	238	February 6, 2024
DataFrame to Dict via Vector of Nested Named Tuples New to Julia jump , dataframes , namedtuple	2	532	November 28, 2021
Transform! to destructure NamedTuple into columns General Usage question , dataframes	7	466	January 21, 2022
Converting NamedTuple to DataFrame seems expensive? New to Julia	7	664	May 3, 2020

How to create a NamedTuples from dynamic field names (for a df to dict converter)

Related topics