Julia1
February 7, 2020, 4:54pm
1
Hi how’s t going?
I’ve searched all over the place for regular expression column filtering in Julia, but I can’t seem to figure it out.
Say you have a variable regex that contains a regular expression string.
This works for me in Pandas:
cols = [val for val in df.columns if df[val].str.contains(regex).any()]
How would I do this in Julia
Are you looking for columns whose name matches a regular expression?
That’s just df[!, r"x"]
. Go here Getting Started · DataFrames.jl and look for the regular expression example.
Julia1
February 7, 2020, 5:36pm
3
No I’m looking columns that have a row value that matches the regular expression. Not the column name itself.
It is simple:
cols = [k for k in names(df) if any(occursin.(r"...", df[:,k]))]
names(df) gives the columns.
occursin(r"…", string) indicates if the regexp is inside the string. If you have a vector you must use it with the “.”.
I recommend the official documentation, and the tutorial of the same author.
I think this won’t work if some of the columns aren’t string type. How about
function hasmatch(col, regex)
eltype(col) <: AbstractString || return false
return any(x -> occursin(regex, x), col)
end
df = DataFrame(A = rand(10), B = "x", C = "y")
regex = r"x"
cols = [col for col in eachcol(df) if hasmatch(col, regex)]
1 Like
Could also add a conditional to @dmolina ’s code:
cols = [k for k in names(df) if eltype(df[!,k]) <: AbstractString && any(occursin.(r"...", df[!,k]))]
Note: I also changed the column selection to df[!,k]
rather than df[:,k]
, since the latter makes a (unneeded) copy of the column.