If I have a DataFrame object with several columns, I can create a subset of this by selecting rows based on conditions either by indexing, subset/@subset or filter. Is there a way I can wrap a function around this sort of data selection that I can pass an arbitrary amount of conditions to like so:
using DataFrames
using DataFramesMeta
df = DataFrame(:a => [1,2], :b => [3,4], :c => [5,6])
sdf = @subset(df, :a .== 1, :c .== 5)
sdf = subset(df, :a => ByRow(==(1)), :c => ByRow(==(5)))
sdf = filter(x -> x.a == 1 && x.c == 5, df)
function fun(df, conditions...)
# do stuff
sdf = @subset(df, <how to pass conditions?>)
# do more stuff
return sdf
end
I like the very concise syntax of @subset but the macro inside the function might be an issue on its own. However, even when replacing it with subset I am not sure how to do it. Is there any other way I could achieve what I would like to do?
Yea if all you are doing is supplying a variable number of conditions to subset, I’m not sure you need to wrap this with another function. Perhaps if you had various switches for which conditions to apply (eg. if some of them were “defaults” that were able to be changed) then it could make sense.
Thanks guys, the solution with subset does work, even with splatting:
using DataFrames
df = DataFrame(:a => [1,2], :b => [3,4], :c => [5,6])
function fun(df, conditions...)
# do something
sdf = subset(df, conditions...)
# do more stuff
return sdf
end
fun(df, :a => ByRow(==(1)), :c => ByRow(==(5)))
The idea was to load some additional data to compare the data from the dataframe to and then plot both. This is going to be manual data exploring, I just wanted to wrap this into a function for which I can arbitrarily adjust the filters and then rerun it without having to fiddle with all the plotting commands again.