TidierData.jl ― Apply transform across columns while retaining access to other variables in the row

My code is

@chain df begin    
    @mutate(across(startswith("β"), x -> τ*x))
end

where τ is another column in df, but it complains that it can’t find it in the Main module. Is there a way to indicate that I want the variable name to be taken from the dataframe in quesiton? Doing var"df.τ" or the backtick notation results in the same error.

(I tried _.τ but got the even more cryptic error that "This function should only be called inside of TidierData.jl macros.")

1 Like

I haven’t tried it out, but perhaps you can try the backtick notation mentioned in the docs.

Or this might require going back to DataFrames.jl. I remember across couldn’t do this previously. Maybe something like:

f(b, t) = t .* b
vars = Symbol.(names(df, r"^b"))
varpairs = [(v, :t) for v in vars]
transform(df, varpairs .=> f .=> vars)
1 Like

Yes, I tried this, it seems to look in Main just as if there were no backticks.

That’s a bummer, in dplyr that works out of the box. But thank you for the code! That should do the trick for now.

Thanks for flagging. Will take a look - this may be a limitation or bug within across() but should be fixable.

1 Like

I figured out a not-terribly-cumbersome way of doing this while staying in TidierData:

@chain df begin
    tmp = _
    @mutate(across(startswith("β"), x -> tmp.τ*x))
end

An annoying side-effect is that the variable gets defined in the outer scope so it’s not really temporary and can overwrite other things with the same name (is it possible to define chain-local variables?). Also, this would obviously not work properly if there’s any row reordering or grouping after assignment.

I assume a proper implementation would add all the dataframe’s columns to the local scope prior to any transform call? I would be happy to help implement this feature but I am still quite new to Julia so that may require too much handholding to be practical.

1 Like

You can use _ multiple times, then you just have to assign the first argument, too:

@chain df begin
    @mutate(_, across(startswith("β"), x -> _.τ*x))
end
3 Likes