DataFrames: obtaining the subset of rows by a set of values

nalimilan · October 16, 2018, 5:39pm

Unfortunately that solution doesn’t fly when there are many columns with heterogeneous types:

julia> df = DataFrame(rand(10000, 100));

julia> df.a = 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> @time _filter(x -> x.x1 > 0.5, df);
  0.809720 seconds (3.87 M allocations: 132.545 MiB, 12.98% gc time)

julia> @time filter(x -> x.x1 > 0.5, df);
  0.105625 seconds (2.52 M allocations: 67.293 MiB, 10.89% gc time)

My current thinking is that the ideal interface would be something like filter(x1 -> x1 > 0.5, df), and we would extract the names of the arguments to identify which variables (here x1) are actually used. That would avoid problems with too large numbers of columns and would offer a compact syntax.

Topic		Replies	Views
Invert a row selection in DataFramesMeta Data dataframes	3	258	April 24, 2023
Filter dataframe with regular expression New to Julia regex , dataframes	8	2593	February 20, 2025
DataFramesMeta.jl insert @where subset programmatically? General Usage question	5	990	October 12, 2017
Confusing/misleading error message for a beginner New to Julia dataframes , error-message , dataframesmeta	10	5108	December 13, 2022
Filter DataFrame by an Array New to Julia	8	5909	December 10, 2019

DataFrames: obtaining the subset of rows by a set of values

Related topics