DataFramesMeta.jl insert @where subset programmatically?

floswald · October 12, 2017, 10:03am

using DataFrames, DataFramesMeta
# this works
y = DataFrame(a=[1,2,3],b=rand(3),c=rand(3))
@where(y,:a.>2)

# how to achieve that?
s = :(:a.<2)  # this criterion will change
@where(y,s)

# i tried
julia> macro wwhere(y,s)
       return :( @where(y, ^(s)) )
       end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)

# and
julia> macro wwhere(y,s)
       return :( @where(y, ^($s)) )
       end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)

I have a function that needs to subset to different things, so this functionality would be very useful for me.

floswald · October 12, 2017, 2:06pm

ok i made a little progress but still not there completely. how can I pass into the function the isless operation I want to do?

julia> f = function(y,var)
       @where(y,_I_(var).<2)
       end
(::#45) (generic function with 1 method)

julia> f(y,:a)
1×3 DataFrames.DataFrame
│ Row │ a │ b       │ c        │
├─────┼───┼─────────┼──────────┤
│ 1   │ 1 │ 0.86181 │ 0.295221 │

piever · October 12, 2017, 2:20pm

Maybe this does what you want:

g = t -> t.>2
f = function(y, var, sel_func)
    @where(y, sel_func(_I_(var)))
end

f(y, :a, g)

Though to be entirely honest, it’s actually very easy to implement your own select function in a reasonably efficient way by extracting columns and then broadcasting your intended conditions over them. See for example this code. Line 13 to 23 is already enough for what you need, you get a function choose_data that takes as input the DataFrame and a Dict, where at symbol s you associate the predicate (a function returning a boolean) to be applied to the column df[s]

floswald · October 12, 2017, 2:47pm

great! exactly what I wanted. i like your second idea as well, but it doesn’t work so well for me as this is embedded in a @linq pipeline of operations. great package by the way!

piever · October 12, 2017, 3:35pm

Glad you like PlugAndPlot!

Personally, I prefer the simple pipeline syntax with @> as it’s less “magic” (it’s also mentioned on the DataFramesMeta README as an alternative to @linq). If you use the @> syntax you can actually add normal functions in between, as long as they take the DataFrame as first argument and output a DataFrame (unless they are at the end of the pipeline and then they can output whatever, e.g. a plot or a different way of summarizing your data).

floswald · October 12, 2017, 3:36pm

ok I see! that sounds pretty cool actually. I should look into that. it’s so easy to get stuck with what “works” at some point. thanks again!

Topic		Replies	Views
How to use `in` with @where [DataFramesMeta]? Data question	6	2025	December 3, 2020
Wrap conditional selection of dataframe into function General Usage metaprogramming , dataframes , dataframesmeta	4	385	June 23, 2022
DataFrames: obtaining the subset of rows by a set of values New to Julia dataframes	45	24001	April 27, 2024
How would I pass in a boolean condition to index a Julia Dataframe General Usage	2	1039	February 7, 2020
Pass Vector to @subset New to Julia question , dataframes , dataframesmeta	2	639	December 14, 2021

DataFramesMeta.jl insert @where subset programmatically?

Related topics