Requiring a NamedTuple instead of just a tuple to return multiple columns from a function for DataFrames.jl seem unintuitive. Discussion

I tried to read the transform doc for DataFrames.jl and I have to say it’s tough going. But I think it contains lots of info.

The situation I want to discuss is actually documented in the doc but the question, how do I return multiple columns from a function in the cols => function => target_cols triplet syntax?

Based, to make it work you need function to return a NamedTuple and returning a Tuple doesn’t work.

I feel this is unintuitive. What are the considerations that go into making returning a Tuple to cause errors? For example, we can check if the output of the function is a tuple and dispatch on that somehow right?

using DataFrames


b=DataFrame(val=rand(100))

function do2(val::AbstractArray)
    println("ok3")
    (meh=val/2, ok=val/3)
end

# approach 1 requiring the function to return NamedTuple worked
b2 = transform(b, :val => do2 => [:a, :b]);

function do3(val::AbstractArray)
    println("ok4")
    (val / 2, val / 3)
end

# approach fails.
b2 = transform(b, :val => do3 => [:a, :b]);

PS. how do I return multiple columns using a framework such as DataFramesMeta.jl or DataFramesMacros.jl or TiderData.jl?

NamedTupe of vectors is considered a table that is column-oriented. This is a Tables.jl rule.

Tuple of “something” is considered a table that is row oriented. This is also a Tables.jl rule.

DataFrames.jl inherits these rules.

You get an error because you have a Tuple that is column oriented, which is not supported by Tables.jl.

In short - a function does not have to return a NamedTuple. Your problem is that Tuple follows another pattern of assumed layout of data. But you could return something else, eg. a Matrix. Here is an example (which works):

julia> function do4(val::AbstractArray)
           [val/2 val/3]
       end
do4 (generic function with 1 method)

julia> b2 = transform(b, :val => do4 => [:a, :b]);
4 Likes

Ok. Just feel unintuitive. Wonder why that is the case? As it feels natural to return just a tuple. I would say this is considered somewhat of a quirk.

I guess @quinnj decided that it is much more typical to have an iterator of rows than an iterator of columns.