Delete missing values after the last non missing value in each id

Hi , i have a dataframe which concatins missing values . i want to filter missing values after the last non missing value in the each id
data frame i have is

df1 = DataFrame(id = [1,1,1,1,1,1] ,b = [1,2,missing,3,missing,missing])
df2 = DataFrame(id = [2,2,2,2,2,2] ,b = [1,2,missing,3,missing,5])
df3 = DataFrame(id = [3,3,3,3,3,3] ,b = [1,2,missing,3,4,missing])
df = [df1;df2;df3]

Here i have 2 missing values after “3” in df.b in id1 . similarly 1 in id 3. i wanted to filter/ remove those missing values.

DataFrame i want is

df1 = DataFrame(id = [1,1,1,1] ,b = [1,2,missing,3])
df2 = DataFrame(id = [2,2,2,2,2,2] ,b = [1,2,missing,3,missing,5])
df3 = DataFrame(id = [3,3,3,3,3,] ,b = [1,2,missing,3,4])
df = [df1;df2;df3]

Can someone help me to do this ?
Thanks.

Check this option:

vcat([g[1:findlast(!isequal(missing), g.b),:] for g in groupby(df,:id)]...)
1 Like

With DataFramesMeta

julia> @chain df begin
           groupby(:id)
           @subset let
               last_missing_ind = findlast(ismissing, :b)
               1:length(:id) .<= last_missing_ind
           end
       end
1 Like

Is there any update in this code. It worked few months back, now it gives below error

MethodError: no method matching (::Colon)(::Int64, ::Nothing)
Closest candidates are:
  (::Colon)(::T, ::Any, ::T) where T<:Real at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/range.jl:41
  (::Colon)(::A, ::Any, ::C) where {A<:Real, C<:Real} at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/range.jl:10
  (::Colon)(::T, ::Any, ::T) where T at /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/base/range.jl:40
  ...
Stacktrace:
 [1] (::var"#65#66")(g::SubDataFrame{DataFrame, DataFrames.Index, Vector{Int64}})
   @ Main ./none:0
 [2] iterate
   @ ./generator.jl:47 [inlined]
 [3] collect_to!(dest::Vector{DataFrame}, itr::Base.Generator{GroupedDataFrame{DataFrame}, var"#65#66"}, offs::Int64, st::Int64)
   @ Base ./array.jl:782
 [4] collect_to_with_first!(dest::Vector{DataFrame}, v1::DataFrame, itr::Base.Generator{GroupedDataFrame{DataFrame}, var"#65#66"}, st::Int64)
   @ Base ./array.jl:760
 [5] collect(itr::Base.Generator{GroupedDataFrame{DataFrame}, var"#65#66"})
   @ Base ./array.jl:734
 [6] top-level scope
   @ none:1

This is the code i used

a = vcat([g[1:findlast(!isequal(missing), g.DV),:] for g in groupby(amik1a,:ID)]...)

Thanks

Your code is constructing a range:

1:findlast(!isequal(missing), g.DV)

where the endpoint is the result of findlast, which returns nothing if no element matches your predicate. This error:

MethodError: no method matching (::Colon)(::Int64, ::Nothing)

is telling you that there is no method for the operator Colon (which is :, the infix operator you’re using to construct your range) which takes an integer as first argument and nothing as a second.

While I can’t guarantee this without an MWE, I assume that g.DV is all missing values, so that findlast returns nothing, and you’re trying to construct the range 1:nothing which doesn’t work:

julia> 1:findlast(!ismissing, [missing, missing, missing])
ERROR: MethodError: no method matching (::Colon)(::Int64, ::Nothing)
1 Like

Thank you very much


function lostmissing(v)
  w=copy(v)
  while !isempty(w) && ismissing(last(w))  # it is important that! isempty (w) is checked before last (v)
    pop!(w)
  end
  return w
end



combine(groupby(df,:id), :b=>lostmissing)
1 Like