Hi, I have a DataFrame with NaN and missing values. And I´m trying to use the function groupby and combine, with a function I define as:
flujos(x)=mean(filter(!isnan,skipmissing(x - lag(x))))
And then I use gpb and combine as:
df = DataFrame(A = [1:2;4;4;4], B = [5, 1, 2, NaN,8], C=[5:8;3], D=[9:12;7], E=[13:16;6]);
gdf=groupby(df,:A)
combine(gdf, :C =>flujos)
But I have a TypeError that says " reducing over an empty collection is not allowed" wich I asume is because my function deletes rows from that column and obviously a DataFrame can´t work with it, but I don´t know how to skip the missing values and do operations over it. Thank u.
I get a different error than you
julia> using ShiftedArrays, Statistics, DataFrames;
julia> flujos(x)=mean(filter(!isnan,skipmissing(x - lag(x))))
flujos (generic function with 1 method)
julia> df = DataFrame(A = [1:2;4;4;4], B = [5, 1, 2, NaN,8], C=[5:8;3], D=[9:12;7], E=[13:16;6]);
julia> gdf=groupby(df,:A)
GroupedDataFrame with 3 groups based on key: A
First Group (1 row): A = 1
Row │ A B C D E
│ Int64 Float64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 1 5.0 5 9 13
⋮
Last Group (3 rows): A = 4
Row │ A B C D E
│ Int64 Float64 Int64 Int64 Int64
─────┼─────────────────────────────────────
1 │ 4 2.0 7 11 15
2 │ 4 NaN 8 12 16
3 │ 4 8.0 3 7 6
julia> combine(gdf, :C =>flujos)
ERROR: ArgumentError: reducing over an empty collection is not allowed
Oh yes that´s my error y copied the wrong error. (ups)
The error should be self explanatory. The first group has one observation, so x - lag(x)
is just missing
, so the mean of skipmissing(x - lag(x))
undefined because skipmissing(x - lag(x))
is empty.
You should handle this case explicitely in your flujos
function.
Ok, but I was looking for a function that skips missing values without deleating them, if there´s so. If not I guess I ´ll have to handle it in my flujos function.
Thank u.