using DataFrames, DataFramesMeta
# this works
y = DataFrame(a=[1,2,3],b=rand(3),c=rand(3))
@where(y,:a.>2)
# how to achieve that?
s = :(:a.<2) # this criterion will change
@where(y,s)
# i tried
julia> macro wwhere(y,s)
return :( @where(y, ^(s)) )
end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)
# and
julia> macro wwhere(y,s)
return :( @where(y, ^($s)) )
end
julia> @wwhere y s
ERROR: MethodError: no method matching ^(::Expr)
I have a function that needs to subset to different things, so this functionality would be very useful for me.
ok i made a little progress but still not there completely. how can I pass into the function the isless operation I want to do?
julia> f = function(y,var)
@where(y,_I_(var).<2)
end
(::#45) (generic function with 1 method)
julia> f(y,:a)
1Γ3 DataFrames.DataFrame
β Row β a β b β c β
βββββββΌββββΌββββββββββΌβββββββββββ€
β 1 β 1 β 0.86181 β 0.295221 β
g = t -> t.>2
f = function(y, var, sel_func)
@where(y, sel_func(_I_(var)))
end
f(y, :a, g)
Though to be entirely honest, itβs actually very easy to implement your own select function in a reasonably efficient way by extracting columns and then broadcasting your intended conditions over them. See for example this code. Line 13 to 23 is already enough for what you need, you get a function choose_data that takes as input the DataFrame and a Dict, where at symbol s you associate the predicate (a function returning a boolean) to be applied to the column df[s]
great! exactly what I wanted. i like your second idea as well, but it doesnβt work so well for me as this is embedded in a @linq pipeline of operations. great package by the way!
Personally, I prefer the simple pipeline syntax with @> as itβs less βmagicβ (itβs also mentioned on the DataFramesMeta README as an alternative to @linq). If you use the @> syntax you can actually add normal functions in between, as long as they take the DataFrame as first argument and output a DataFrame (unless they are at the end of the pipeline and then they can output whatever, e.g. a plot or a different way of summarizing your data).