How to apply a formula to a predefined dataframe's column

input_string = "3.0 + log(x1) + 5.0*x2"
my_data = DataFrame(x1 = rand(10), x2 = rand(10))

function my_function(df::DataFrame, formula::String)
	#want: 3 + log.(my_data[:x1]) + 5.0*my_data[:x2]
end

Below I give an attempt to do so which fails:

macro Name(arg)
   string(arg)
end
# x = "hiiiii"
# @Name(x) returns "x"

function my_function(df::DataFrame, formula::String)
    namedf = @Name(df)
    for name in string.(names(df))
      	formula = replace(formula, name, "$(namedf)[:$(name)]")
    end

    μ = @.eval(parse(formula))
    return μ 
end

my_function(my_data, input_string)
ERROR: UndefVarError: df not defined

Is there some reason you are attempting to parse a string rather simply using an actual function?

For example you could do

f = (x1, x2) -> 3 + log(x1) + 5.0*x2
f.(df.x1, df.x2)

Note that you may also find Query.jl helpful.

Hi, thanks for the reply. I appreciate your solution but I think it wouldn’t work for my purpose.

I have a list of DataFrames with different header names, and I want to apply an user inputted transformation to each column by having the user write down the formula explicitly as a string.

For instance,
col_1 and col_2 could be the column names, and f, g be the 2 functions,

user should input:
f(\text{col_1}) + g(\text{col_2}).

Obviously I’m not sure what’s the best way to do this, so I’m open to suggestions.

parse the input string in some way then use include_string to make it a function. That might work well.

If it’s user input, that sounds like the user could just put in a function. I’m missing the context, so I’m not really sure what the right approach is.

Regardless, if you want to do something like what you’re describing, you coud do something like

using MacroTools
using MacroTools: postwalk

function applyformula(form, df::Symbol)
    postwalk(x -> @capture(x, :(y_)) ? :($df[$x]) : x, form)
end
applyformula(form::AbstractString, df::Symbol) = applyformula(Meta.parse(form), df)

macro evalformula(form, df)
    esc(:(@. $(applyformula(form, df))))
end

So that, you coud do, for example

@evalformula "3.0 + log(:x1) + 5.0*:x2" df

or

eval(applyformula("3.0 + log(:x1) + 5.0*:x2", :df))

Keep in mind this kind of syntactic transformation is what DataFramesMeta.jl and Query.jl already do, so you could also pass strings to those using Meta.parse to transform them into expressions. (You may need to do :(@query_macro $parsed_string) if they don’t provide you with functions.)

One thing I have been thinking about doing for quite a while which is related to this is reading in my LaTeX source code and generating Julia code from it. The procedure would be

  1. Transform LaTeX strings into valid Julia expressions using string tools.
  2. Perform further transformations to the Julia expressions using metaprogramming tools.
  3. Evaluate.

Perhaps you had something like that in mind?

Hi~

One thing you can do is to create a unique global variable name of the user inputted dataframe like this:

function my_function(df::DataFrame, formula::String)
    global input_data_from_user = df
    for name in string.(names(df))
      	formula = replace(formula, name, "input_data_from_user[:$(name)]")
    end

    μ = @.eval(parse(formula))
    return μ 
end

Or, if your dataframe is huge and making a copy is just inefficient, another way around it is to have the user specify the dataframe name as a string.

function my_function(df::DataFrame, df_name::String, formula::String)
    for name in string.(names(df))
      	formula = replace(formula, name, df_name * "[:$(name)]")
    end

    μ = @.eval(parse(formula))
    return μ 
end

There may be better solutions since this one just doesn’t use any other packages. Just my thoughts!

1 Like