using StatsBase
function tsfill!(df, t=:year, xfill=:x)
maxy = maximum(df.t)
miny = minimum(df.t)
completeyears = miny:maxy
yearstofill = filter(x -> x ∉ df.t, completeyears)
for y in yearstofill
sub = subset(df, t => yr -> yr .== y - 1 .|| yr .== y + 1)
append!(df, DataFrame(t = y, xfill = mean(sub[:, xfill])))
end
sort!(df, t)
return df
end
That function errors in the expression DataFrame(t = y, xfill = mean(sub[:, xfill])) since t and xfill are Symbols and DataFrame expects a normal expression (ie year = y and x = mean(...). How can I convert from Symbol to “normal expression”?
Not that I’m aware of, although I’m not entirely sure what your definition of “administrative task” is - what you’re doing here is a specific imputation scheme, for which there are packages like Impute.jl, which has substitute
I mean something like data management for DataFrames. I find the tools provided in the default package to be too low level, so I usually do any data management in Stata before bringing it to Julia.
Again very hard to say without a better idea of what you mean by *data management" - I’d say there’s very little that you can do in Stata that you can’t do in DataFrames (mainly manipulations of panel data which leverage Stata’s ability to set an id and time dimension, although I haven’t used Stata seriously in about five years), while at the same time there’s lots of stuff you can do on dataframes that you would struggle to do in stata (without resorting to using Mata), simply because you have all the power and expressivity of base Julia at your disposal.
If it’s just about dataframes being more verbose than stata (because you have to write df[df.col1 .> 1, :] etc you might want to look into DataFramesMeta.
Other than that I don’t think where are packages for what I world consider “data management” (ie filtering, transforming, aggregating data) of DataFrames as DataFrames is the package designed for this already.
So the most useful thing from your perspective is probably to ask here for solutions to specific “data management” tasks which you think can’t be done in DataFrames.jl