Convert symbol to expression

amrods · January 4, 2022, 6:04am

I’m trying to complete a DataFrame that omits some years with the average of adjacent years. For example:

using DataFrames

df = DataFrame(year = [1995, 1996, 1997, 1999, 2001], x = float(1:5))

From that, I am trying to obtain:

DataFrame(year = 1995:2001, x = [1, 2, 3, 3.5, 4, 4.5, 5])

This is what I’ve done:

using StatsBase

function tsfill!(df, t=:year, xfill=:x)
    maxy = maximum(df.t)
    miny = minimum(df.t)
    completeyears = miny:maxy
    yearstofill = filter(x -> x ∉ df.t, completeyears)
    for y in yearstofill
        sub = subset(df, t => yr -> yr .== y - 1 .|| yr .== y + 1)
        append!(df, DataFrame(t = y, xfill = mean(sub[:, xfill])))
    end
    sort!(df, t)
    return df
end

That function errors in the expression DataFrame(t = y, xfill = mean(sub[:, xfill])) since t and xfill are Symbols and DataFrame expects a normal expression (ie year = y and x = mean(...). How can I convert from Symbol to “normal expression”?

nilshg · January 4, 2022, 6:30am

You don’t have to, just do DataFrame(t => y) - the constructor called with a pair accepts a string/symbol on the left hand side of the pair.

amrods · January 4, 2022, 7:46am

Is there a package for doing these kind of “administrative tasks” for DataFrames?

nilshg · January 4, 2022, 8:56am

Not that I’m aware of, although I’m not entirely sure what your definition of “administrative task” is - what you’re doing here is a specific imputation scheme, for which there are packages like Impute.jl, which has substitute

https://invenia.github.io/Impute.jl/stable/api/imputation/#Impute.substitute

as well as a k-nearest neighbour imputation scheme - either of those might be amenable to what you’re doing here although I haven’t tried.

amrods · January 5, 2022, 4:30am

I mean something like data management for DataFrames. I find the tools provided in the default package to be too low level, so I usually do any data management in Stata before bringing it to Julia.

nilshg · January 5, 2022, 6:52am

Again very hard to say without a better idea of what you mean by *data management" - I’d say there’s very little that you can do in Stata that you can’t do in DataFrames (mainly manipulations of panel data which leverage Stata’s ability to set an id and time dimension, although I haven’t used Stata seriously in about five years), while at the same time there’s lots of stuff you can do on dataframes that you would struggle to do in stata (without resorting to using Mata), simply because you have all the power and expressivity of base Julia at your disposal.

If it’s just about dataframes being more verbose than stata (because you have to write df[df.col1 .> 1, :] etc you might want to look into DataFramesMeta.

Other than that I don’t think where are packages for what I world consider “data management” (ie filtering, transforming, aggregating data) of DataFrames as DataFrames is the package designed for this already.

So the most useful thing from your perspective is probably to ask here for solutions to specific “data management” tasks which you think can’t be done in DataFrames.jl

Topic		Replies	Views
Which symbols are special cased in the same way as `.*`, `.+`, etc Internals parsing	5	722	October 22, 2020
Subset using symbol from function input Data dataframes	4	501	January 19, 2022
Converting Missing to Date New to Julia dataframes	2	763	November 8, 2021
Symbolics.jl convert string to symbolic equation General Usage	3	263	February 24, 2024
Convert String to function name General Usage	28	15048	December 25, 2023

Convert symbol to expression

Related topics