Grouped Data Frame -- Two different types

Snowy · August 6, 2022, 4:51pm

Hi!
I have two different implementations of grouping dataframes, but I’m trying to determine why they are different types.
Here is my code:

using CSV
using DataFrames

timeseries = CSV.read("link to data file (see below)",DataFrame)
#1
transform!(groupby(sort(timeseries, [:Name, :date]), :Name), :Name => eachindex => :rank)

#2
gdf = groupby(sort(timeseries,[:Name,:date]),:Name)
transform!(gdf,:Name => eachindex => :rank)

As far as I can tell, both these implementations should produce the same result, but when I look at the type…

julia> typeof(gdf)
GroupedDataFrame{DataFrame}

julia> typeof(timeseries)
DataFrame

Am I overlooking something super obvious here? Where are these different types?
If you want to reproduce, here is the link to the csv I’m reading: timeseries_0.csv - Google Drive

Thanks!

skleinbo · August 6, 2022, 5:22pm

Operations like transform, select, etc. ungroup a GroupedDataFrame by default and return a normal DataFrame. If you want to retain the grouping, provide the keyword ungroup=false.

From the docstring of transform:

• ungroup::Bool=true : whether the return value of the operation on gd should be a data frame or a
GroupedDataFrame.

Snowy · August 6, 2022, 6:04pm

And so if ungroup is the default behavior, why wouldn’t transform! ungroup gdf in implementation #2, as it does in #1?

bkamins · August 6, 2022, 8:47pm

This is unrelated in this case. timeseries is a DataFrame, because you read it in as a DataFrame with CSV.read. The fact that you later run transform! on it does not affect the fact that timeseries is bound to a DataFrame. Running a function on some value does not change a binding of a name to a value that you made earlier.

The same case is with gdf. You bind this name to the return value of groupby which is GroupedDataFrame. The fact that you run transform! later does not matter here.

However the return value of transform! in both cases is DataFrame as @skleinbo noted (unless ungroup kwarg is passed).

It does ungroup in both cases as I have commented above. But the design of Julia is such that it does not affect the bindings of names to values that you do before running transform!.

Topic		Replies	Views
Reapply groupby on a GroupedDataFrame Data	2	305	March 12, 2021
Bug in DataFrames grouping General Usage	8	437	July 24, 2020
Dropmissing(!) is undefined for GroupedDataFrames Data dataframes	15	599	April 6, 2022
Group, Mutate, Ungroup Data question , dataframes	2	1763	February 4, 2018
Grouped Data Frames General Usage dataframes , juliadb , grouped-data	8	418	February 7, 2023

Grouped Data Frame -- Two different types

Related topics