How would I write this Pandas column filter code in Julia?

Julia1 · February 7, 2020, 4:54pm

Hi how’s t going?

I’ve searched all over the place for regular expression column filtering in Julia, but I can’t seem to figure it out.

Say you have a variable regex that contains a regular expression string.

This works for me in Pandas:
cols = [val for val in df.columns if df[val].str.contains(regex).any()]

How would I do this in Julia

aaowens · February 7, 2020, 5:05pm

Are you looking for columns whose name matches a regular expression?

That’s just df[!, r"x"]. Go here Getting Started · DataFrames.jl and look for the regular expression example.

Julia1 · February 7, 2020, 5:36pm

No I’m looking columns that have a row value that matches the regular expression. Not the column name itself.

dmolina · February 7, 2020, 6:45pm

It is simple:

cols = [k for k in names(df) if any(occursin.(r"...", df[:,k]))]

names(df) gives the columns.
occursin(r"…", string) indicates if the regexp is inside the string. If you have a vector you must use it with the “.”.

I recommend the official documentation, and the tutorial of the same author.

aaowens · February 7, 2020, 7:31pm

I think this won’t work if some of the columns aren’t string type. How about

function hasmatch(col, regex)
    eltype(col) <: AbstractString || return false
    return any(x -> occursin(regex, x), col)
end
df = DataFrame(A = rand(10), B = "x", C = "y")
regex = r"x"
cols = [col for col in eachcol(df) if hasmatch(col, regex)]

kevbonham · February 8, 2020, 1:54am

Could also add a conditional to @dmolina’s code:

cols = [k for k in names(df) if eltype(df[!,k]) <: AbstractString && any(occursin.(r"...", df[!,k]))]

Note: I also changed the column selection to df[!,k] rather than df[:,k], since the latter makes a (unneeded) copy of the column.

Topic		Replies	Views
Filter dataframe with regular expression New to Julia regex , dataframes	7	2414	March 1, 2022
Filtering a DataFrame column using Regex Data dataframes	2	1639	June 4, 2020
Simple wildcard search in dataframes General Usage dataframes	2	940	April 1, 2021
Pandas.series.str.extract equivalent? Data dataframes	2	2045	December 27, 2017
Searching for a regular expression inside an array New to Julia	16	5706	October 15, 2018

How would I write this Pandas column filter code in Julia?

Related topics