I am trying to filter a dataframe based on values in certain column. This works fine when I use subset() with ByRow(in([“mytext”])). But I get problems with regular expressions:
df = subset(df, :mycol => x -> ByRow(occursin(r"mytext", x) ))
ERROR: MethodError: no method matching occursin(::Regex, ::Array{String31,1})
Closest candidates are:
occursin(::Regex, ::SubString; offset) at regex.jl:176
occursin(::Regex, ::AbstractString; offset) at regex.jl:171
Thanks! I have a large dataset of >2M records and I was of the idea that the target filter column was complete but yes it had a number of missing values! i have dropped the missing values and the subset works. PS: newish to Julia and DataFrames and migrating from R
As a hint, most environments (REPL, VSCode, IJulia, Pluto…) will print the column type at the top of the table, underneath the column names. This will generally alert you to the presence of missing values either with a question mark or an explicit Union{Float64, missing} as column type:
Notice the question mark in Int64? for the second column. This doesn’t necessarily mean that there are missing values, as the type won’t change after you’ve filtered missings out, but it can at least prompt you to check!