Is it possible to filter a data frame based on a dictionary whose keys are a subset of the columns and whose values match the dict?
For example, something like this (does not work):
df = DataFrame(a=[1,2,3],b=["a","b","c"],c=[1.,0,missing])
3×3 DataFrame
Row │ a b c
│ Int64 String Float64?
─────┼──────────────────────────
1 │ 1 a 1.0
2 │ 2 b 0.0
3 │ 3 c missing
fd = Dict(:a=>1,:b=>"a") # filter on this
filter(keys(fd) => (keys(fd)) .== values(fd),df) # try to filter on `fd`
ERROR: MethodError: no method matching getindex(::DataFrames.Index, ::Base.KeySet{Symbol,Dict{Symbol,Any}})
This should work
julia> filter(df) do row
tokeep = true
for k in keys(fd)
if row[k] != fd[k]
tokeep = false
end
end
return tokeep
end
1×3 DataFrame
Row │ a b c
│ Int64 String Float64?
─────┼─────────────────────────
1 │ 1 a 1.0
Not sure how to make this using the source => fun
syntax, though.
EDIT: Doing this with the source => fun
syntax is very hard. But will be easier in DataFrames 1.0 (upcoming in a few months) which contains the function subset
.
2 Likes
filter
takes arrays of columns as the left hand side of the pair in the first argument. The problem with what you have above is that keys
returns Base.KeySet
and values
returns a Base.ValueIterator
. If you use a generator that should work. Using your code above:
df = DataFrame(a = 1:3, b = ["a", "b", "c"], c = [1., 0, missing])
fd = Dict(:a => 1, :b => "a")
### Changes start here
filter([key for key in keys(fd)] => ((a, b) -> [a, b] == [val for val in values(fd)]), df)