Can you avoid stating the arguments in an anonymous function?

Lincoln_Hannah · October 15, 2021, 1:57am

Is it possible to avoid stating the arguments in an anonymous function?

x = (a=1,b=2)
( (a,b) -> a+b )(x...)

works fine, but could you do something like
( (_...) -> a+b )(x...)

x would have to contain elements a and b. Otherwise there would be a run-time error.

gbaraldi · October 15, 2021, 2:08am

I think you can only do that with varags, and it’s not exactly what you want but

x = (a...) -> sum(a)
julia> x(1,2,3)
6

jling · October 15, 2021, 2:11am

what does this mean? (i.e. what do you want it to do?) For example, what even is a,b? If you want to do something and don’t care about the argument, you should

julia> x
(a = 1, b = 2)

julia> values(x)
(1, 2)

julia> sum(x)
3

Lincoln_Hannah · October 15, 2021, 2:19am

I’d like to use variables in the body of a function without listing them explicitly as function inputs.

(probably not possible. just thought I’d check)

jling · October 15, 2021, 2:27am

how is that logically possible? how does the function know what variables to use then? maybe you’re thinking about closure?

julia> g(y0) = x-> x+y0
g (generic function with 1 method)

julia> g(3)(4)
7

Lincoln_Hannah · October 15, 2021, 2:45am

Background –
For DataFrames, I’d like an easy way to do row-level operations, and use the column names as variable names and have any created variables added automatically as new columns.

in this discussing a nice method was suggested

but it requires writing a function where you have to specify the column names as inputs and outputs.

function f(; a, b, ... )
    c = a + b
    d = a - b
    (; a, b, c, d) 
end

D = DataFrame( a=[1,2], b=[3.4) )

@chain D begin
       transform( AsTable(:) =>   ByRow(x->f(x...)) => AsTable )
end

I’d like to avoid re-specifying the column names at all. Something like:

@chain D begin
        ***drop to row level****   begin
            c = a + b
            d = a -  b
            end
end

pdeffebach · October 15, 2021, 2:54am

Yes. DataFramesMeta.jl does exactly this.

julia> df = DataFrame(a = [1, 2, 3], b = [5, 6, 7]);

julia> @rtransform df begin
           :c = :a + :b
           :d = :a - :b
       end
3×4 DataFrame
 Row │ a      b      c      d
     │ Int64  Int64  Int64  Int64
─────┼────────────────────────────
   1 │     1      5      6     -4
   2 │     2      6      8     -4
   3 │     3      7     10     -4

Lincoln_Hannah · October 15, 2021, 3:17am

I have DataFramesMeta installed but for some reason @rtransform isn’t recognised

pdeffebach · October 15, 2021, 3:19am

Check your version. Current version is 0.9.1 @rtransform should be in 0.9.0 and newer.

Lincoln_Hannah · October 15, 2021, 4:02am

works thank you so much.

is there any way to avoid the colons in the variable names?

Related question - there are various DF filtering functions @where, @subset
All seem to require vectorising the conditional statement eg :a .> 3.

Is there a way of doing @rsubset so you don’t have to vectorise the statument e.g :a > 3

pdeffebach · October 15, 2021, 6:37pm

No, you can’t avoid the colons. Columns are referenced as :x.

The reason for this is that we need a way to distinguish, at parse time, the columns in a data frame from other variables. Without knowing anything about the data frame.

x = 1
@rtransform df :y = :x + x

Obviously it’s possible to make unquoted symbols, i.e. x column references and leave special syntax for everything else. But then you would hvae to apply lots of escaping rules.

@rtransform df y = begin 
    $x = 100
    x + z
end

This might get out of hand when people want to use missing, map with a function as the first argument, etc.

I would put a positive spin on this and say the use of :x makes code more readable because you can distinguish easily between columns and variables, which can get confusing in dplyr.

Finally, @rsubset exists in 0.9.0 and newer. @where is deprecated in favor of subset, so there is just one filtering function in DataFramesMeta.jl.

Lincoln_Hannah · October 16, 2021, 1:21am

@rsubset Awesome thank you

Last question. Within an @rtransform bloc new columns can’t depend on other new columns.
Is there a way to have a row level block with multiple interdependent new columns?

pdeffebach · October 16, 2021, 1:28am

Yes! I’m releasing a new version tomorrow with this feature. Stay tuned for the announcement.

Lincoln_Hannah · October 18, 2021, 12:07am

Had some more questions. Put them in a new topic with appropriate heading.

Topic		Replies	Views
Apply function By Row without re-stating column names General Usage dataframes , functions	36	3483	May 9, 2022
Rewriting dplyr code which uses a function of columns in Julia -style using DataFrames.jl General Usage dataframes	5	601	March 25, 2021
Using a (computed) function in DataFrames with multiple arguments Data macros , dataframes	11	606	December 19, 2022
DataFramesMeta questions General Usage dataframes	35	1230	November 10, 2021
DataFrames not showing value Data	11	521	March 9, 2020

Can you avoid stating the arguments in an anonymous function?

Related topics