I have two different implementations of grouping dataframes, but I’m trying to determine why they are different types.
Here is my code:
timeseries = CSV.read("link to data file (see below)",DataFrame)
transform!(groupby(sort(timeseries, [:Name, :date]), :Name), :Name => eachindex => :rank)
gdf = groupby(sort(timeseries,[:Name,:date]),:Name)
transform!(gdf,:Name => eachindex => :rank)
As far as I can tell, both these implementations should produce the same result, but when I look at the type…
Am I overlooking something super obvious here? Where are these different types?
If you want to reproduce, here is the link to the csv I’m reading: timeseries_0.csv - Google Drive
select, etc. ungroup a
GroupedDataFrame by default and return a normal
DataFrame. If you want to retain the grouping, provide the keyword
From the docstring of
• ungroup::Bool=true : whether the return value of the operation on gd should be a data frame or a
And so if ungroup is the default behavior, why wouldn’t transform! ungroup gdf in implementation #2, as it does in #1?
This is unrelated in this case.
timeseries is a
DataFrame, because you read it in as a
CSV.read. The fact that you later run
transform! on it does not affect the fact that
timeseries is bound to a
DataFrame. Running a function on some value does not change a binding of a name to a value that you made earlier.
The same case is with
gdf. You bind this name to the return value of
groupby which is
GroupedDataFrame. The fact that you run
transform! later does not matter here.
However the return value of
transform! in both cases is
DataFrame as @skleinbo noted (unless
ungroup kwarg is passed).
It does ungroup in both cases as I have commented above. But the design of Julia is such that it does not affect the bindings of names to values that you do before running