Summarizing grouped DataFrame where a group is entirely missing

Sorry, now facing another problem… I want to have the option to calculate the linear trend of the variable (in time = index) and do it like that:

function trendOfVec(y::AbstractArray{Union{Missing, T}}) where T<:Number
   xy = DataFrame(; x = range(1.0, length(y)), y=y)
   lreg = lm(@formula(y~x), xy)
   coef(lreg)[2]
end

That does not work with skipmissing, in the call fun \circ skipmissing because there is no length of skipmissing type defined. On the other hand, lm can work with missings directly (contrary to mean or median which need the skipmissing). So I tried to define

function trendOfVec(y::Base.SkipMissing{Vector{Union{Missing, T}}}) where T<:Number
  trendOfVec(y.x)
end

but that does not work because the Type given to savefun by skipmissing in

@combine {outvar} = (savefun ∘ skipmissing)({vb}) 

is more complicated.

I ended up with

pre = fun == trendOfVec  ? identity : skipmissing

...

{outvar} = (savefun ∘ pre)({vbl})

but don’t find this satisfying.

Basically I find it not very intuitive that

@combine {outvar} = (savefun ∘ skipmissing)({vb}) 

does not give Base.SkipMissing{Vector{Union{Missing, T}} to savefun, and don’t know how to achieve it.

Hope I am clear enough…

I was probably not clear enough… but by browsing discourse I found by chance in a post of @pdeffebach , that eachindex returns the original indices, which solves the original problem:

 ## This works with skipmissing type
  function trendOfVec(y)
     xy = DataFrame(; x = eachindex(y) |> collect, y=y |> collect)
     lreg = lm(@formula(y~x), xy)
     coef(lreg)[2]
  end

Still I believe “harmonizing” the ecosystem, such that all statistical functions do the “same” when missings are present, would be very helpful. Currently lm skips them and mean etc. returns missing.

You might be interested in the following stalled PR to DataFramesMeta.jl

1 Like