DataFramesMeta.jl insert @where subset programmatically?

question

#1
using DataFrames, DataFramesMeta
# this works
y = DataFrame(a=[1,2,3],b=rand(3),c=rand(3))
@where(y,:a.>2)

# how to achieve that?
s = :(:a.<2)  # this criterion will change
@where(y,s)

# i tried
julia> macro wwhere(y,s)
       return :( @where(y, ^(s)) )
       end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)

# and
julia> macro wwhere(y,s)
       return :( @where(y, ^($s)) )
       end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)

I have a function that needs to subset to different things, so this functionality would be very useful for me.


#2

ok i made a little progress but still not there completely. how can I pass into the function the isless operation I want to do?

julia> f = function(y,var)
       @where(y,_I_(var).<2)
       end
(::#45) (generic function with 1 method)

julia> f(y,:a)
1×3 DataFrames.DataFrame
│ Row │ a │ b       │ c        │
├─────┼───┼─────────┼──────────┤
│ 1   │ 1 │ 0.86181 │ 0.295221 │

#3

Maybe this does what you want:

g = t -> t.>2
f = function(y, var, sel_func)
    @where(y, sel_func(_I_(var)))
end

f(y, :a, g)

Though to be entirely honest, it’s actually very easy to implement your own select function in a reasonably efficient way by extracting columns and then broadcasting your intended conditions over them. See for example this code. Line 13 to 23 is already enough for what you need, you get a function choose_data that takes as input the DataFrame and a Dict, where at symbol s you associate the predicate (a function returning a boolean) to be applied to the column df[s]


#4

great! exactly what I wanted. i like your second idea as well, but it doesn’t work so well for me as this is embedded in a @linq pipeline of operations. great package by the way!


#5

Glad you like PlugAndPlot!

Personally, I prefer the simple pipeline syntax with @> as it’s less “magic” (it’s also mentioned on the DataFramesMeta README as an alternative to @linq). If you use the @> syntax you can actually add normal functions in between, as long as they take the DataFrame as first argument and output a DataFrame (unless they are at the end of the pipeline and then they can output whatever, e.g. a plot or a different way of summarizing your data).


#6

ok I see! that sounds pretty cool actually. I should look into that. it’s so easy to get stuck with what “works” at some point. thanks again!