Trying to perform combine over one columns logged over multiple days and get delta (max - min) for 300 columns in a dataset with 10 million rows.
- How can i do this efficiently ?
- I also notice if i combine over 10 columns at a time, the answers are returned instantly, is there a way to auto-generate code for combine ?
- Is there a way to use for loop to loop over the columns to combine the same way ?
get_delta(x) = maximum(skipmissing(x)) - minimum(skipmissing(x))
get_delta_days(x) = (x -> x.value).(maximum(skipmissing(x)) - minimum(skipmissing(x)))
@time delta_df = @chain data_joined begin
groupby(["unique_column"])
combine(
:column1 => first => :column1,
:datestamp => get_delta_days => :days,
:column2_hw => get_delta => :column2_diff,
:column3_hw => get_delta => :column3_diff,
# have 300 more columns
)
# @combine begin
# :days = get_delta_days(:datestamp)
# :n = length(:device_id)
# :column2_diff = get_delta(:column2)
# :column3_diff = get_delta(:column3)
# $AsTable = (column2_diff = get_delta(:column2), column3_diff = get_delta(:column3))
# end
end
Pkg and machine info
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, westmere)
DataFrames v1.3.5