Julia1  
                
               
                 
              
                  
                    February 7, 2020,  4:54pm
                   
                   
              1 
               
             
            
              Hi how’s t going?
I’ve searched all over the place for regular expression column filtering in Julia, but I can’t seem to figure it out.
Say you have a variable regex that contains a regular expression string.
This works for me in Pandas: 
cols = [val for val in df.columns if df[val].str.contains(regex).any()]
How would I do this in Julia
             
            
               
               
               
            
            
           
          
            
            
              Are you looking for columns whose name matches a regular expression?
That’s just df[!, r"x"]. Go here Getting Started · DataFrames.jl  and look for the regular expression example.
             
            
               
               
               
            
            
           
          
            
              
                Julia1  
                
               
              
                  
                    February 7, 2020,  5:36pm
                   
                   
              3 
               
             
            
              No I’m looking columns that have a row value that matches the regular expression. Not the column name itself.
             
            
               
               
               
            
            
           
          
            
            
              It is simple:
cols = [k for k in names(df) if any(occursin.(r"...", df[:,k]))]
 
names(df) gives the columns. 
occursin(r"…", string) indicates if the regexp is inside the string. If you have a vector you must use it with the “.”.
I recommend the official documentation, and the tutorial of the same author.
             
            
               
               
               
            
            
           
          
            
            
              I think this won’t work if some of the columns aren’t string type. How about
function hasmatch(col, regex)
    eltype(col) <: AbstractString || return false
    return any(x -> occursin(regex, x), col)
end
df = DataFrame(A = rand(10), B = "x", C = "y")
regex = r"x"
cols = [col for col in eachcol(df) if hasmatch(col, regex)]
 
             
            
               
               
              1 Like 
            
            
           
          
            
            
              Could also add a conditional to @dmolina ’s code:
cols = [k for k in names(df) if eltype(df[!,k]) <: AbstractString && any(occursin.(r"...", df[!,k]))]
 
Note: I also changed the column selection to df[!,k] rather than df[:,k], since the latter makes a (unneeded) copy of the column.