DataFrame transform with ByRow can’t handle Date values?

Trying to work with date data in a DataFrame using ByRow, I ran into a curious “MethodError: no method matching keys(::Date)” error, which I reduced to the following trivial example:

d = DataFrame(a=1, b=Date("2022-1-1"))
transform(d, [:a] => ByRow(x -> x) => [:c])  # ✅ works as expected, copying column a to new column c
transform(d, [:b] => ByRow(x -> x) => [:c])  # ❌ MethodError: no method matching keys(::Date)

What am I missing here? I’m sure it’s some basic newcomer’s mistake.

Answered my own question. Leaving the answer here in case it helps anyone searching the error message:

The ByRow function can return multiple columns, and thus is supposed to return an array. This is true even when it’s only creating a single column — still needs to be an array. Except…Julia magically promotes single numbers to arrays in this context, which lead me down a false path omitting the braces from my ByRow function’s return value.

Here is the solution:

d = DataFrame(a=1, b=Date("2022-1-1"))
transform(d, [:a] => ByRow(x -> [x]) => [:c])  # ✅ also works
transform(d, [:b] => ByRow(x -> [x]) => [:c])  # ✅ fixes the problem above
#                               ↑ ↑
#                          braces fix it

By putting a vector on the right hand side you’re indicating that you are expecting a multi-column output, and DataFrames will attempt to iterate over the output to give you this. So the more obvious way to “fix” your MWE is to have a symbol (rather than a length-one vector of Symbols) on the RHS:

julia> transform(df, [:b] => ByRow(x -> x) => :c)
1×3 DataFrame
 Row │ a      b           c          
     │ Int64  Date        Date       
─────┼───────────────────────────────
   1 │     1  2022-04-18  2022-04-18

If this is right maybe there’s an argument here to special case lenght-one vectors on the RHS @bkamins?

Also no need for ByRow here afaict, you could just have (x -> x) or maybe even more obvious identity as your transformation. (I would actually just write df.c = df[:, b]).

2 Likes

The point is that if you do e.g. [:a] you indicate that you want to unwrap these length 1 vectors, while if you write e.g. :a you indicate you do not want to unwrap them.

The exact rule is (I pass here a simplified entry from docstring):

If target_cols is a vector of strings it is assumed that function returns multiple columns.

  1. If fun returns one of AbstractDataFrame, NamedTuple, DataFrameRow, AbstractMatrix then they are expaned into multiple columns.
  2. If fun returns an AbstractVector then each element of this vector must support the keys function, which must return a collection of Symbols, strings or integers; the return value of keys must be identical for all elements. Then as many columns are created as there are elements in the return value of the keys function.
  3. If fun returns a value of any other type then it is assumed that it is a table conforming to the Tables.jl API and the Tables.columntable function is called on it to get the resulting columns and their names.

Now regarding:

Except…Julia magically promotes single numbers to arrays in this context, which lead me down a false path omitting the braces from my ByRow function’s return value.

Julia does not promote numbers to arrays. You are hitting here the case that numbers in Julia are considered to be 0-dimensional containers, so:

julia> x = 1
1

julia> keys(x)
Base.OneTo(1)

julia> x[1]
1
2 Likes

Thanks for expanding on this. Do you have an example of where it is helpful to have [:c] behave differently from :c on the RHS?

Here is an example where there is a difference. As commented above the question is if you want to unwrap the inner vectors or not:

julia> df = DataFrame(x = [[i] for i in 1:4])
4×1 DataFrame
 Row │ x
     │ Array…
─────┼────────
   1 │ [1]
   2 │ [2]
   3 │ [3]
   4 │ [4]

julia> select(df, :x => [:y])
4×1 DataFrame
 Row │ y
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4

julia> select(df, :x => :y)
4×1 DataFrame
 Row │ y
     │ Array…
─────┼────────
   1 │ [1]
   2 │ [2]
   3 │ [3]
   4 │ [4]
1 Like