Hi,
I am trying to apply a condition to a selection of columns based on one column and the resulting dataframe.
Here is an example:
df
10×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 0.0981372 0.0804932 0.513072 0.450144 0.52068
2 │ 0.630371 0.968476 0.556494 0.493968 0.347883
3 │ 0.64709 0.643473 0.670181 0.765159 0.609178
4 │ 0.394128 0.437874 0.782004 0.15423 0.58172
5 │ 0.373165 0.783987 0.0135377 0.587961 0.328194
6 │ 0.0741134 0.889961 0.0378584 0.345939 0.567646
7 │ 0.311853 0.122679 0.0823076 0.615964 0.271999
8 │ 0.880035 0.553531 0.491427 0.505312 0.50748
9 │ 0.285774 0.187106 0.547379 0.193307 0.894698
10 │ 0.0776693 0.0266613 0.658087 0.0714121 0.667334
How can I apply a filter (.< df.x5) on the first 2 columns and get all the columns as a result?
I tried this but it is giving me a boolean of the first 2 columns.
df[:, Not(Between(:x3,end))] .> df.x5
10×2 DataFrame
Row │ x1 x2
│ Bool Bool
─────┼──────────────
1 │ false false
2 │ true true
3 │ true true
4 │ false false
5 │ true true
6 │ false true
7 │ true false
8 │ true true
9 │ false false
10 │ false false
I would like a resulting dataframe containing all the columns from x1 to x5 with the filtered row values
I hope this is clear enough …
Thanks!
nilshg
November 2, 2021, 12:12pm
2
There are loads of ways of doing this, maybe the simplest one is:
df[(df.x1 .> df.x5) .& (df.x2 .> df.x5), :]
1 Like
Thanks,
If I have more columns to do the filtering on, is there a way to set a condition that would exclude a certain number of columns (e.g. Not(:x3,end)) from the filtering while keeping them in the output?
sijo
November 2, 2021, 12:51pm
4
Here’s one way:
julia> using DataFrames
julia> df = DataFrame(rand(8,5), :auto)
8×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────
1 │ 0.39997 0.315291 0.564286 0.181416 0.272942
2 │ 0.406449 0.136713 0.869279 0.856068 0.653893
3 │ 0.642072 0.371682 0.581247 0.94918 0.106684
4 │ 0.0550675 0.778223 0.953143 0.0357677 0.562967
5 │ 0.0294628 0.146155 0.480069 0.237095 0.586552
6 │ 0.621742 0.456891 0.635841 0.308975 0.630643
7 │ 0.267329 0.634341 0.269092 0.169279 0.629037
8 │ 0.864646 0.624525 0.553297 0.32189 0.245451
julia> columns = Cols(:x5, Not(Between(:x3, ncol(df))));
julia> myfilter(x, v...) = all(<(x), v);
julia> filter(columns => myfilter, df) # Or subset(df, columns => ByRow(myfilter))
3×5 DataFrame
Row │ x1 x2 x3 x4 x5
│ Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────
1 │ 0.406449 0.136713 0.869279 0.856068 0.653893
2 │ 0.0294628 0.146155 0.480069 0.237095 0.586552
3 │ 0.621742 0.456891 0.635841 0.308975 0.630643
1 Like
sijo
November 2, 2021, 1:06pm
5
Other solutions:
filter(df) do row
all(<(row.x5), row[Not(Between(:x3, end))])
end
columns = names(df, Not(Between(:x3, ncol(df))))
subset(df, (vcat.(columns, "x5") .=> .<)...)
2 Likes
Thanks! - went for the “filter - row” solution.
Just out of curiosity - could you think of something with DataFramesMeta? using @rsubset maybe?
sijo
November 5, 2021, 10:23am
7
Note that the filter with do block is the slowest solution I think, but maybe it’s fine for your use case.
I don’t think DataFramesMeta helps in this case (its support for multi-column arguments is a bit rudimentary for now). The best I can come up with is
columns = Cols(:x5, Not(Between(:x3, ncol(df))))
myfilter(x, v...) = all(<(x), v)
@subset df $(columns => ByRow(myfilter))
which you could write in one line at the cost of readability…
Maybe @pdeffebach has a better idea?
1 Like