Hi how’s t going?
I’ve searched all over the place for regular expression column filtering in Julia, but I can’t seem to figure it out.
Say you have a variable regex that contains a regular expression string.
This works for me in Pandas:
cols = [val for val in df.columns if df[val].str.contains(regex).any()]
How would I do this in Julia
Are you looking for columns whose name matches a regular expression?
df[!, r"x"]. Go here http://juliadata.github.io/DataFrames.jl/stable/man/getting_started/#Indexing-syntax-1 and look for the regular expression example.
No I’m looking columns that have a row value that matches the regular expression. Not the column name itself.
It is simple:
cols = [k for k in names(df) if any(occursin.(r"...", df[:,k]))]
names(df) gives the columns.
occursin(r"…", string) indicates if the regexp is inside the string. If you have a vector you must use it with the “.”.
I recommend the official documentation, and the tutorial of the same author.
I think this won’t work if some of the columns aren’t string type. How about
function hasmatch(col, regex)
eltype(col) <: AbstractString || return false
return any(x -> occursin(regex, x), col)
df = DataFrame(A = rand(10), B = "x", C = "y")
regex = r"x"
cols = [col for col in eachcol(df) if hasmatch(col, regex)]
Could also add a conditional to @dmolina’s code:
cols = [k for k in names(df) if eltype(df[!,k]) <: AbstractString && any(occursin.(r"...", df[!,k]))]
Note: I also changed the column selection to
df[!,k] rather than
df[:,k], since the latter makes a (unneeded) copy of the column.