What I want to do is:
-
Group a data-frame by column (“col1”) in
DataFrames.jl
, call itgrouped_df
-
Send the sub-tables from the
grouped_df
to a user defined function- The user-defined function will transform an existing column and should add a new column with relevant data
I want to transform an old column and add a new column in the same function because both action depend on basically same compute that is recoreded slightly differently.
function apply_scan_difference(df::AbstractDataFrame,ref_scan::Float64)
current_scan::Float64 = calculate_current_scan(df)
scan_difference::Float64 = current_scan - ref_scan
df.scan = df.scan .- scan_difference
insertcols!(df, ncol(df)+1, :shift .= scan_difference ) # `scan_difference` has already been calculated and being able to record it here would be efficient,right`
return df::AbstractDataFrame
end
As you can see scan_difference
is calculated for each group and is applied to the old column scan
but I also want to record it for each group. Mentally, it feels like being able to do this in the same function would be the more effiecient way to do it.
But when I combine(sdf -> apply_scan_different(sdf, ref_scan), grouped_df)
I get this error: MethodError: no method matching ndims(::Type{Symbol})
I tried changing the function to
function apply_scan_difference(df::AbstractDataFrame,ref_scan::Float64)
current_scan::Float64 = calculate_current_scan(df)
scan_difference::Float64 = current_scan - ref_scan
df.scan = df.scan .- scan_difference
insertcols!(df, ncol(df)+1, :shift => fill(scan_difference, nrow(df))) # `scan_difference` has already been calculated and being able to record it here would be efficient,right`
return df::AbstractDataFrame
end
And called the function as transform(sdf -> apply_scan_different(sdf, ref_scan), grouped_df)
but that produces this error
ArgumentError: Column shift is already present in the data frame which is not allowed when
makeunique=true
and only applies the new column data to one group (looks like the first group).
I do not fully understand what it means to set makeunique=false`.
What is the right way to do this in Julia DataFrames.jl
or does this goes against the grain and is not advisable?