Combine function not naming column as cols => function => target_cols implies

Hi there I’m still new to Julia. Trying to figure out why some behavior is not as I would it expect to be.
In the dataframes example for the split-apply-combine strategy here https://dataframes.juliadata.org/stable/man/split_apply_combine/
there is an example:

using DataFrames, CSV, Statistics

path = joinpath(pkgdir(DataFrames), “docs”, “src”, “assets”, “iris.csv”);

iris = CSV.read(path, DataFrame)

iris_gdf = groupby(iris, :Species)

combine(iris_gdf, :PetalLength => mean => :myMean)

which works fine and as intended, but when I use an anonymous function like this

combine(iris_gdf, :PetalLength => x → mean(x) => :myMean)

I get a data frame like this:
grafik
Which seems like a bug.
Question is, how do I get the new column for a custom function with a proper name and proper content without renaming the column in an extra line.
My “bad” solution is like this:

rename(combine(iris_gdf, :PetalLength => x → mean(x)), “PetalLength_function” => “myMean”)

It’s not a bug, just operator precedence, you need to enclose your anonymous function in brackets.

2 Likes

you mean parenthesis?
I got it working with

combine(iris_gdf, :PetalLength => (x -> mean(x)) => :myMean)

Could you please explain why

combine(iris_gdf, :PetalLength => x -> mean(x) => :myMean)

is different from

combine(iris_gdf, :PetalLength => (x -> mean(x)) => :myMean)

but

combine(iris_gdf, :PetalLength => mean => :myMean)

and

combine(iris_gdf, :PetalLength => (mean) => :myMean)

is the same?

1 Like

Like I said it’s just operator precedence:

julia> :x => mean => :x
:x => (Statistics.mean => :x)

julia> :x => x -> mean(x) => :x
:x => var"#3#4"()

julia> :x => (x -> mean(x)) => :x
:x => (var"#5#6"() => :x)

(mean) is just the same as mean, while x -> mean => :x is not the same as (x -> mean(x)) => :x in the same way that (1) is the same as 1 but 1 + 2 \times 3 is not the same as (1 + 2) \times 3

3 Likes

The Pair(s) are resolved before being passed to the combine function. Let’s look below at what the combine function is receiving as its second argument.

Your original version is a single Pair from the column name to a function output. (The “var” thing is denoting an anonymous function.) The output of the anonymous function is a Pair, which you can see in your provided PetalLength_function column screenshot.

julia> :PetalLength => x -> mean(x) => :myMean
:PetalLength => var"#1#2"()

Julia reads the above the same way as if you would have put the parentheses around the final Pair. Written this way, you have only provided combine with a source column and an operation function.

julia> :PetalLength => x -> (mean(x) => :myMean)
:PetalLength => var"#13#14"()

julia> typeof(:PetalLength => x -> (mean(x) => :myMean))
Pair{Symbol, var"#17#18"}

You need to separate the operation function from the new column name with parentheses so that combine sees nested Pairs as its second argument.

julia> :PetalLength => (x -> mean(x)) => :myMean
:PetalLength => (var"#3#4"() => :myMean)

julia> typeof(:PetalLength => (x -> mean(x)) => :myMean)
Pair{Symbol, Pair{var"#5#6", Symbol}}

Read here to learn how to format your code snippets:

1 Like

Thank you for your explanation.
Though, I wonder why

combine(iris_gdf, :PetalLength => mean => :myMean)

works as it does.

In DataFramesMeta.jl (a package which I maintain), it is straightforward to use column names programatically

incol = :PetalLength
outcol = :myMean
@combine iris_gdf $outcol = mean($incol)

or even

@by iris :Species $outcol = mean($incol)
1 Like

There is no -> character in that expression. It is an issue of parsing order between the => and -> symbols.

As nilshg said, think of it like 2 * 3 + 4 * 5 vs. 2 * 3 * 5. Then with parenthesis, the first one changes meaning 2 * (3 + 4) * 5, but the second one doesn’t 2 * (3) * 5.

1 Like