Filling in missing values within a group using values carried forward and backward

Given a DataFrame with fields :a and :b such that :a is a group variable and :b is a value variable which has many missing values. What is the optimal method to fill the missing values in :b to satisfy the following condition? Replace missing :b value for each group in :a with the closest non-missing row value in :b within that group.

Here is one method. Are there better alternatives? The method employs a function defined within a loop over groups. The function references and updates a default value within the loop. A transform! within the loop uses the function to update the group/parent data frame.

# first, the df
let
    df = DataFrame(a=repeat([1, 2, 3, 4], inner=[10]), b = repeat([[missing, missing, 50, missing, missing]; 10; repeat([missing], 4)], outer=[4])) 
end
# now the update method
let
    df = DataFrame(a=repeat([1, 2, 3, 4], inner=[10]), b = repeat([[missing, missing, 50, missing, missing]; 10; repeat([missing], 4)], outer=[4])) 
    gdf = groupby(df, :a)
    for g in gdf
        dvalue = coalesce(g.b...)
        function cvalue(v)
            cvalue = coalesce(v, dvalue)
            dvalue = cvalue
        end
        transform!(g, :b => ByRow(cvalue) => :c)
    end
    return df
end

Thanks.

filldown(v)=accumulate((x,y)->coalesce(y,x), v,init=v[1])
fillup(v)=reverse(filldown(reverse(v)))

combine(combine(gdf, :b=>filldown =>:b),:b=>fillup)

Your example does not do “the closest non-missing row value” but rather carries forward values and falls back to backward if the first observations are missing. If this is what you want apart from what @rocco_sprmnt21 proposed you can use a combination of locf and nocb functions from Impute.jl.

Both solutions work. And in working through them, I learned more base Julia from @rocco_sprmnt21 and about a neat package from @bkamins who is also correct in restating my problem.

Thanks so much. What a great community!

Slightly modified for clarity and summarized for future seekers:

using DataFrames, Impute

filldown(v) = accumulate( (x, y) -> coalesce(y, x), v, init = v[1] )
fillup(v) = reverse(filldown(reverse(v)))

let
    df = DataFrame(a=repeat([1, 2, 3, 4], inner=[10]), b = repeat([[missing, missing, 50, missing, missing]; 10; repeat([missing], 4)], outer=[4]))

    gdf = groupby(df, :a)
    for g in gdf
        transform!(g, :b => filldown => :c)
        transform!(g, :c => fillup => :c)
    end
    return df
end

let
    df = DataFrame(a=repeat([1, 2, 3, 4], inner=[10]), b = repeat([[missing, missing, 50, missing, missing]; 10; repeat([missing], 4)], outer=[4]))
    gdf = groupby(df, :a)
    for g in gdf
        transform!(g, :b => Impute.locf => :c)
        transform!(g, :c => Impute.nocb => :c)
    end
    return df
end
1 Like

Note that you can also do:

df = DataFrame(a=repeat([1, 2, 3, 4], inner=[10]), b = repeat([[missing, missing, 50, missing, missing]; 10; repeat([missing], 4)], outer=[4]))
transform!(groupby(df, :a), :b => Impute.nocb∘Impute.locf => :c)
1 Like