Filter a dataframe based on a dictionary

Is it possible to filter a data frame based on a dictionary whose keys are a subset of the columns and whose values match the dict?

For example, something like this (does not work):

df = DataFrame(a=[1,2,3],b=["a","b","c"],c=[1.,0,missing])
3×3 DataFrame
 Row │ a      b       c         
     │ Int64  String  Float64?  
─────┼──────────────────────────
   1 │     1  a             1.0
   2 │     2  b             0.0
   3 │     3  c       missing   

fd = Dict(:a=>1,:b=>"a") # filter on this

filter(keys(fd) => (keys(fd)) .== values(fd),df) # try to filter on `fd`

ERROR: MethodError: no method matching getindex(::DataFrames.Index, ::Base.KeySet{Symbol,Dict{Symbol,Any}})

This should work

julia> filter(df) do row
       tokeep = true
       for k in keys(fd)
           if row[k] != fd[k]
               tokeep = false
           end
       end
       return tokeep
       end
1×3 DataFrame
 Row │ a      b       c
     │ Int64  String  Float64?
─────┼─────────────────────────
   1 │     1  a            1.0

Not sure how to make this using the source => fun syntax, though.

EDIT: Doing this with the source => fun syntax is very hard. But will be easier in DataFrames 1.0 (upcoming in a few months) which contains the function subset.

2 Likes

filter takes arrays of columns as the left hand side of the pair in the first argument. The problem with what you have above is that keys returns Base.KeySet and values returns a Base.ValueIterator. If you use a generator that should work. Using your code above:

df = DataFrame(a = 1:3, b = ["a", "b", "c"], c = [1., 0, missing])
fd = Dict(:a => 1, :b => "a")
### Changes start here
filter([key for key in keys(fd)] => ((a, b) -> [a, b] == [val for val in values(fd)]), df)