Finding the position in array

Mastomaki · February 14, 2022, 12:44pm

myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])

Now trying to find the position of :b elements in certain vector and make a new column out of it:

myd = transform(myd, :b => (v -> findfirst(x -> x==v, ["saba","daba","myy","aba","oba"])) => :sort_order)

I end up with column of nothing.

jakobnissen · February 14, 2022, 12:59pm

It’s because your function is wrongly specified: You check if x == v where x is an element in the vector v.
You could do

v2 =  ["saba","daba","myy","aba","oba"]
transform(myd, :b => v -> [findfirst(isequal(i), v) for i in v2])

To be more efficient, you should probably create a Dict{String, Int} that maps strings to indices, then use that to look up. Something like

d = Dict(s => i for (i, s) in enumerate(v2))
myd.pos = [d[s] for s in myd.b]

rafael.guerra · February 14, 2022, 1:31pm

@Mastomaki, what result do you expect?

Mastomaki · February 14, 2022, 5:26pm

There should be a new column :sort_order with values 4,1,2 because .e.g “aba” is the 4th element in
[“saba”,“daba”,“myy”,“aba”,“oba”].

rafael.guerra · February 14, 2022, 6:23pm

So you already have one working answer above.

I would do:

v = ["saba","daba","myy","aba","oba"]
myd.sortorder = [findfirst(==(x), v) for x in myd.b]

DataFrames · February 15, 2022, 3:57am

myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])
df = DataFrame(b = ["saba","daba","myy","aba","oba"], sort_order = 1:5)
leftjoin!(myd, df, on = :b)

Mastomaki · February 15, 2022, 6:42am

So what is the difference between isequal() and == ?

digital_carver · February 15, 2022, 7:09am

@Mastomaki
From the documentation:

isequal(x, y)

Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN
values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a Bool value.

Basically, isequal is what you would intutively expect equality to be. For numeric purposes or data analysis purposes, NaN, 0.0 and -0.0 (which are different bit patterns), and missing have specific defined behavior that makes them behave differently from other values. == follows that defined behaviour (NaN == NaN is false, for eg.), which is useful for those purposes.

But for other purposes, like using them as keys in a Dict, you usually want them behave like any other value, and that’s why isequal exists and is used by Dict. isequal(NaN, NaN) returns true.

digital_carver · February 15, 2022, 7:12am

In this case though, that’s not what makes the difference. The code could have as well done [findfirst(==(i), v) for i in v2] and behaved the same. The difference here is that you were comparing x to the whole of v, which is obviously false, whereas here we’re iterating over v and comparing x to its individual values (strings).

Topic		Replies	Views
Find position of Array elements in another Array General Usage indexing , arrays	16	3553	February 1, 2023
Finding Position of Element in an Array General Usage question	46	104607	October 9, 2019
Sort rows in a dataframe based on a predefined order New to Julia sort , dataframes	5	1781	September 17, 2021
Positions in an array General Usage question , indexing , arrays , splitapplycombine	6	454	December 15, 2021
Determine whether an element is in an array New to Julia array	4	1163	February 3, 2024

Finding the position in array

Related topics