Fill up and fill down rows

Hi i have a dataframe looks like this

df1 = DataFrame()
df1.id = sort!(repeat(1:3,5))
df1.a = [1,missing,2,3,missing,missing,2,3,4,5, 1,2,3,missing,5]

i want to fill the missing values in column a with the previous value of same id

i want a dataframe like this

df2 = DataFrame()
df2.id = sort!(repeat(1:3,5))
df2.a = [1,1,2,3,3,missing,2,3,4,5, 1,2,3,3,5]

can somebody help me to do this

Here is a quite verbose way of doing it. :wink:

for gdf in groupby(df1,:id)
  for row_idx in 2:nrow(gdf)
    if ismissing(gdf.a[row_idx])
      gdf.a[row_idx] = gdf.a[row_idx-1]
    end
  end
end
1 Like

what kind of midification should i do , to fill the value with next value of same id

is this fine

for gdf in groupby(df1,:id)
  for row_idx in 1:nrow(gdf)-1
    if ismissing(gdf.a[row_idx])
      gdf.a[row_idx] = gdf.a[row_idx + 1]
    end
  end
end

It looks ok.

Thanks, is there any other way of doing it ?

Here is a one-liner, but I find it hard to comprehend.

combine(groupby(df1,:id),:a=>(x->[x[1],coalesce.(x[2:end],x[1:end-1])...])=>:a)
1 Like

Thanks

If you want to update df1 in-place do:

using Impute
for sdf in groupby(df1,:id)
    sdf.a .= Impute.locf(sdf.a)
end

or

transform!(groupby(df1, :id), :a => Impute.locf => :a)

if you want a new data frame:

transform(groupby(df1, :id), :a => Impute.locf => :a)
6 Likes

InMemoryDatasets package has ffill and bfill similar to pandas functions.

using InMemoryDatasets
ds=Dataset(df1)
modify(IMD.groupby(ds,:id),:a=>ffill!)
2 Likes

an idea taken from an old post of mine

df = DataFrame(dt1=[missing, 0.2, missing, missing, 1, missing, 5, 6],
                      dt2=[9, 0.3, missing, missing, 3, missing, 5, 6])
filldown(v)=accumulate((x,y)->coalesce(y,x), v,init=v[1])

transform(df,[:dt1,:dt2].=>filldown,renamecols=false)

fillup(v)=reverse(filldown(reverse(v)))

transform(df,[:dt2,:dt1].=>[filldown,fillup],renamecols=false)

If I may profit from this discussion to ask: is there any performance reasons not to use the “verbose” loopy version ?

I know that loops are usually easier on the compiler, but is this reasoning still true for DataFrames ?