JuliaDB select/filter multiple string values in one column

Hello Julia Community,

I have a question regarding usage of JuliaDB.

My table consist with 5 columns and I need to select/filter by one column containing string values.

I am able to do it with one string value:

table = filter(x -> x == "LUX - All-in fee", df, select = :FEE_TYPE)

I have additional 3 string values and I would like to have them in one line like in the following Python/Pandas line:

df = df[df.FEE_TYPE.str.contains('LUX - All-in fee|LUX - IM fee|LUX - ManCo fee|LUX - Perf fee')]

I would appreciate any help from experienced Julia users.

Kind regards

Mac

There’s probably a function for that in juliadb, but with stdlibs

filtervals = ["LUX - All-in fee"; "LUX - IM fee"; "..."]
table = filter(x-> contains(==, filtervals, x), df, select = :FEE_TYPE)
1 Like

@y4lu Thank you very much, it worked perfectly!

Ah you’re welcome
filter() is the correct function too, ref

If someone is interested, I also made it to work in one line with the boolean operator ||

table = filter(x -> x == "LUX - All-in fee" || x == "LUX - IM fee" || x == "LUX - ManCo fee" || x == "LUX - Perf fee", df, select = :FEE_TYPE)

Cheers

Does it also work with the in operator?

To check that x is one out of three possible strings, you could put the strings into an Array, for example:

filter(x -> x in ["LUX - All-in fee", "LUX - IM fee", "LUX - ManCo fee", "LUX - Perf fee"], df, select = :FEE_TYPE)

If instead you want to check that x contains one of the three strings, it may be worth looking into regular expressions. For example this would be:

filter(r"LUX - (All-in|IM|ManCo|Perf) fee", df, select = :FEE_TYPE)

1 Like

The solution with ‘in’ operator is very nice and concise :grinning::+1:

Thank you!

Hi everyone,

I also tried this solution but didn’t work for me.

sing DataFrames, Pkg, CSV, Gadfly, HypothesisTests, Statistics

data = CSV.read("/Users/home/Documents/MP blog 2021/Data/UEFA champions league/data_2022_AH.csv", DataFrame, normalizenames=true)

first(data,5)

df_PS = select(data, :Equipo, :Score, :Remate, :Remate_arco, :Posesion, :Pases, :Precision_pases,  :Faltas, :Corners)

I was able to run my code properly.

But when I try this

filter_vals =["Paris Saint-Germain"; "Sevilla"; "Manchester City"; "Ajax"]

table = filter(x-> contains(==, filter_vals, x), df_PS, select = :Equipo)

An error appear that select isn’t found.

Any help would be highly appreciated.

You aren’t working with JuliaDB, so you should probably start a new thread when asking about unrelated packages (in any case it’s advisable to start a new thread rather than resurrect a four year old one).

That said it sounds like you’re just looksing for something like

df_PS[in(filter_vals).(df_PS.Equipo), :]
1 Like

Thank you very much! Sorry for the confusion. If I have other questions I would start a new thread.

It worked!

1 Like