Better Way to Filter Strings from DataFrame?

r0b0ty · September 29, 2022, 10:10pm

I have a DataFrame containing a column of Strings and a separate, standalone Vector of strings. I want a resulting DataFrame with the rows containing the strings in the standalone vector… basically check the text column against each string entry in the vector.

I did this, but seems overly convoluted for this task. Is there a simpler or more efficient way?

Thanks!

julia> df = DataFrame(:number => [1, 2, 3, 4], :text => ["Green Car", "Purple Grape", "Yellow Banana", "Purple Bruise"])
4×2 DataFrame
 Row │ number  text
     │ Int64   String
─────┼───────────────────────
   1 │      1  Green Car
   2 │      2  Purple Grape
   3 │      3  Yellow Banana
   4 │      4  Purple Bruise

julia> keyword = ["Green", "Brown", "Black", "Blue"]
4-element Vector{String}:
 "Green"
 "Brown"
 "Black"
 "Blue"

julia> result = DataFrame(:number => Integer[], :text => String[])
0×2 DataFrame

julia> for color in keyword
           append!(result, df[contains.(df.text, color), :])
       end

julia> result
1×2 DataFrame
 Row │ number   text
     │ Integer  String
─────┼────────────────────
   1 │       1  Green Car

AndiMD · September 29, 2022, 11:05pm

Hi!
How about this:
filter(row->any(occursin.(keyword,row.text)),df )

r0b0ty · September 29, 2022, 11:33pm

Amazingly simple, thank you! I had tried a variation of that, but without the any() function which is probably why it didn’t work.

FYI, I tried it with the contains() function too and it worked - I wouldn’t have thought to use occursin().

filter(row -> any(occursin.(keyword, row.text)), df)
1×2 DataFrame
 Row │ number  text      
     │ Int64   String    
─────┼───────────────────
   1 │      1  Green Car

pdeffebach · September 30, 2022, 1:28pm

With DataFramesMeta.jl one solution is

julia> @subset df occursin.(keyword, :text)
1×2 DataFrame
 Row │ number  text      
     │ Int64   String    
─────┼───────────────────
   1 │      1  Green Car

Topic		Replies	Views
Occursin with a vector New to Julia	7	1066	July 31, 2023
Extract row if column contains text in Julia dataframe New to Julia	1	934	August 28, 2019
Easiest quickest way to search DataFrame w/wildcards Data	7	4057	March 5, 2020
DataFrames: obtaining the subset of rows by a set of values New to Julia dataframes	45	24042	April 27, 2024
Filter dataframe with regular expression New to Julia regex , dataframes	8	2611	February 20, 2025

Better Way to Filter Strings from DataFrame?

Related topics