Fill up and fill down rows

sai_matcha · April 28, 2022, 9:41am

Hi i have a dataframe looks like this

df1 = DataFrame()
df1.id = sort!(repeat(1:3,5))
df1.a = [1,missing,2,3,missing,missing,2,3,4,5, 1,2,3,missing,5]

i want to fill the missing values in column a with the previous value of same id

i want a dataframe like this

df2 = DataFrame()
df2.id = sort!(repeat(1:3,5))
df2.a = [1,1,2,3,3,missing,2,3,4,5, 1,2,3,3,5]

can somebody help me to do this

feanor12 · April 28, 2022, 10:24am

Here is a quite verbose way of doing it.

for gdf in groupby(df1,:id)
  for row_idx in 2:nrow(gdf)
    if ismissing(gdf.a[row_idx])
      gdf.a[row_idx] = gdf.a[row_idx-1]
    end
  end
end

sai_matcha · April 28, 2022, 10:35am

what kind of midification should i do , to fill the value with next value of same id

sai_matcha · April 28, 2022, 10:36am

is this fine

for gdf in groupby(df1,:id)
  for row_idx in 1:nrow(gdf)-1
    if ismissing(gdf.a[row_idx])
      gdf.a[row_idx] = gdf.a[row_idx + 1]
    end
  end
end

feanor12 · April 28, 2022, 10:38am

It looks ok.

sai_matcha · April 28, 2022, 10:38am

Thanks, is there any other way of doing it ?

feanor12 · April 28, 2022, 10:46am

Here is a one-liner, but I find it hard to comprehend.

combine(groupby(df1,:id),:a=>(x->[x[1],coalesce.(x[2:end],x[1:end-1])...])=>:a)

sai_matcha · April 28, 2022, 10:47am

Thanks

bkamins · April 28, 2022, 10:53am

If you want to update df1 in-place do:

using Impute
for sdf in groupby(df1,:id)
    sdf.a .= Impute.locf(sdf.a)
end

or

transform!(groupby(df1, :id), :a => Impute.locf => :a)

if you want a new data frame:

transform(groupby(df1, :id), :a => Impute.locf => :a)

monopolynomial · May 1, 2022, 1:06am

InMemoryDatasets package has ffill and bfill similar to pandas functions.

using InMemoryDatasets
ds=Dataset(df1)
modify(IMD.groupby(ds,:id),:a=>ffill!)

rocco_sprmnt21 · February 28, 2023, 9:01pm

an idea taken from an old post of mine

df = DataFrame(dt1=[missing, 0.2, missing, missing, 1, missing, 5, 6],
                      dt2=[9, 0.3, missing, missing, 3, missing, 5, 6])
filldown(v)=accumulate((x,y)->coalesce(y,x), v,init=v[1])

transform(df,[:dt1,:dt2].=>filldown,renamecols=false)

fillup(v)=reverse(filldown(reverse(v)))

transform(df,[:dt2,:dt1].=>[filldown,fillup],renamecols=false)

lrnv · February 28, 2023, 9:46pm

If I may profit from this discussion to ask: is there any performance reasons not to use the “verbose” loopy version ?

I know that loops are usually easier on the compiler, but is this reasoning still true for DataFrames ?

Topic		Replies	Views
Filling in missing values within a group using values carried forward and backward Data dataframes	4	422	August 5, 2023
Delete missing values after the last non missing value in each id New to Julia dataframes	7	551	September 1, 2022
Change column to row using conditions New to Julia dataframes	4	449	June 23, 2022
Replacing missings based on values in two DataFrame columns General Usage dataframes	1	308	July 16, 2021
Adding a column to a dataframe and conditionally filling based on another column within the same dataframe New to Julia question	4	161	September 15, 2024

Fill up and fill down rows

Related topics