How would I write this Pandas column filter code in Julia?

Hi how’s t going?

I’ve searched all over the place for regular expression column filtering in Julia, but I can’t seem to figure it out.

Say you have a variable regex that contains a regular expression string.

This works for me in Pandas:
cols = [val for val in df.columns if df[val].str.contains(regex).any()]

How would I do this in Julia

Are you looking for columns whose name matches a regular expression?

That’s just df[!, r"x"]. Go here http://juliadata.github.io/DataFrames.jl/stable/man/getting_started/#Indexing-syntax-1 and look for the regular expression example.

No I’m looking columns that have a row value that matches the regular expression. Not the column name itself.

It is simple:

cols = [k for k in names(df) if any(occursin.(r"...", df[:,k]))]

names(df) gives the columns.
occursin(r"…", string) indicates if the regexp is inside the string. If you have a vector you must use it with the “.”.

I recommend the official documentation, and the tutorial of the same author.

I think this won’t work if some of the columns aren’t string type. How about

function hasmatch(col, regex)
    eltype(col) <: AbstractString || return false
    return any(x -> occursin(regex, x), col)
end
df = DataFrame(A = rand(10), B = "x", C = "y")
regex = r"x"
cols = [col for col in eachcol(df) if hasmatch(col, regex)]
1 Like

Could also add a conditional to @dmolina’s code:

cols = [k for k in names(df) if eltype(df[!,k]) <: AbstractString && any(occursin.(r"...", df[!,k]))]

Note: I also changed the column selection to df[!,k] rather than df[:,k], since the latter makes a (unneeded) copy of the column.