Hi i have a dataframe looks like this
df1 = DataFrame()
df1.id = sort!(repeat(1:3,5))
df1.a = [1,missing,2,3,missing,missing,2,3,4,5, 1,2,3,missing,5]
i want to fill the missing values in column a with the previous value of same id
i want a dataframe like this
df2 = DataFrame()
df2.id = sort!(repeat(1:3,5))
df2.a = [1,1,2,3,3,missing,2,3,4,5, 1,2,3,3,5]
can somebody help me to do this
Here is a quite verbose way of doing it.
for gdf in groupby(df1,:id)
for row_idx in 2:nrow(gdf)
if ismissing(gdf.a[row_idx])
gdf.a[row_idx] = gdf.a[row_idx-1]
end
end
end
1 Like
what kind of midification should i do , to fill the value with next value of same id
is this fine
for gdf in groupby(df1,:id)
for row_idx in 1:nrow(gdf)-1
if ismissing(gdf.a[row_idx])
gdf.a[row_idx] = gdf.a[row_idx + 1]
end
end
end
Thanks, is there any other way of doing it ?
Here is a one-liner, but I find it hard to comprehend.
combine(groupby(df1,:id),:a=>(x->[x[1],coalesce.(x[2:end],x[1:end-1])...])=>:a)
1 Like
If you want to update df1
in-place do:
using Impute
for sdf in groupby(df1,:id)
sdf.a .= Impute.locf(sdf.a)
end
or
transform!(groupby(df1, :id), :a => Impute.locf => :a)
if you want a new data frame:
transform(groupby(df1, :id), :a => Impute.locf => :a)
6 Likes
InMemoryDatasets
package has ffill
and bfill
similar to pandas
functions.
using InMemoryDatasets
ds=Dataset(df1)
modify(IMD.groupby(ds,:id),:a=>ffill!)
2 Likes
an idea taken from an old post of mine
df = DataFrame(dt1=[missing, 0.2, missing, missing, 1, missing, 5, 6],
dt2=[9, 0.3, missing, missing, 3, missing, 5, 6])
filldown(v)=accumulate((x,y)->coalesce(y,x), v,init=v[1])
transform(df,[:dt1,:dt2].=>filldown,renamecols=false)
fillup(v)=reverse(filldown(reverse(v)))
transform(df,[:dt2,:dt1].=>[filldown,fillup],renamecols=false)
lrnv
13
If I may profit from this discussion to ask: is there any performance reasons not to use the “verbose” loopy version ?
I know that loops are usually easier on the compiler, but is this reasoning still true for DataFrames ?