Trying to work with date data in a DataFrame using ByRow, I ran into a curious “MethodError: no method matching keys(::Date)” error, which I reduced to the following trivial example:
d = DataFrame(a=1, b=Date("2022-1-1"))
transform(d, [:a] => ByRow(x -> x) => [:c]) # ✅ works as expected, copying column a to new column c
transform(d, [:b] => ByRow(x -> x) => [:c]) # ❌ MethodError: no method matching keys(::Date)
What am I missing here? I’m sure it’s some basic newcomer’s mistake.
Answered my own question. Leaving the answer here in case it helps anyone searching the error message:
The ByRow function can return multiple columns, and thus is supposed to return an array. This is true even when it’s only creating a single column — still needs to be an array. Except…Julia magically promotes single numbers to arrays in this context, which lead me down a false path omitting the braces from my ByRow function’s return value.
Here is the solution:
d = DataFrame(a=1, b=Date("2022-1-1"))
transform(d, [:a] => ByRow(x -> [x]) => [:c]) # ✅ also works
transform(d, [:b] => ByRow(x -> [x]) => [:c]) # ✅ fixes the problem above
# ↑ ↑
# braces fix it
By putting a vector on the right hand side you’re indicating that you are expecting a multi-column output, and DataFrames will attempt to iterate over the output to give you this. So the more obvious way to “fix” your MWE is to have a symbol (rather than a length-one vector of Symbols) on the RHS:
julia> transform(df, [:b] => ByRow(x -> x) => :c)
1×3 DataFrame
Row │ a b c
│ Int64 Date Date
─────┼───────────────────────────────
1 │ 1 2022-04-18 2022-04-18
If this is right maybe there’s an argument here to special case lenght-one vectors on the RHS @bkamins?
Also no need for ByRow here afaict, you could just have (x -> x) or maybe even more obvious identity as your transformation. (I would actually just write df.c = df[:, b]).
The point is that if you do e.g. [:a] you indicate that you want to unwrap these length 1 vectors, while if you write e.g. :a you indicate you do not want to unwrap them.
The exact rule is (I pass here a simplified entry from docstring):
If target_cols is a vector of strings it is assumed that function returns multiple columns.
If fun returns one of AbstractDataFrame, NamedTuple, DataFrameRow, AbstractMatrix then they are expaned into multiple columns.
If fun returns an AbstractVector then each element of this vector must support the keys function, which must return a collection of Symbols, strings or integers; the return value of keys must be identical for all elements. Then as many columns are created as there are elements in the return value of the keys function.
If fun returns a value of any other type then it is assumed that it is a table conforming to the Tables.jl API and the Tables.columntable function is called on it to get the resulting columns and their names.
Now regarding:
Except…Julia magically promotes single numbers to arrays in this context, which lead me down a false path omitting the braces from my ByRow function’s return value.
Julia does not promote numbers to arrays. You are hitting here the case that numbers in Julia are considered to be 0-dimensional containers, so: