Finding the position in array

myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])

Now trying to find the position of :b elements in certain vector and make a new column out of it:

myd = transform(myd, :b => (v -> findfirst(x -> x==v, ["saba","daba","myy","aba","oba"])) => :sort_order)

I end up with column of nothing.

It’s because your function is wrongly specified: You check if x == v where x is an element in the vector v.
You could do

v2 =  ["saba","daba","myy","aba","oba"]
transform(myd, :b => v -> [findfirst(isequal(i), v) for i in v2])

To be more efficient, you should probably create a Dict{String, Int} that maps strings to indices, then use that to look up. Something like

d = Dict(s => i for (i, s) in enumerate(v2))
myd.pos = [d[s] for s in myd.b]
1 Like

@Mastomaki, what result do you expect?

There should be a new column :sort_order with values 4,1,2 because .e.g “aba” is the 4th element in
[“saba”,“daba”,“myy”,“aba”,“oba”].

So you already have one working answer above.

I would do:

v = ["saba","daba","myy","aba","oba"]
myd.sortorder = [findfirst(==(x), v) for x in myd.b]
1 Like
myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])
df = DataFrame(b = ["saba","daba","myy","aba","oba"], sort_order = 1:5)
leftjoin!(myd, df, on = :b)
2 Likes

So what is the difference between isequal() and == ?

@Mastomaki
From the documentation:

isequal(x, y)

Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN
values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a Bool value.

Basically, isequal is what you would intutively expect equality to be. For numeric purposes or data analysis purposes, NaN, 0.0 and -0.0 (which are different bit patterns), and missing have specific defined behavior that makes them behave differently from other values. == follows that defined behaviour (NaN == NaN is false, for eg.), which is useful for those purposes.

But for other purposes, like using them as keys in a Dict, you usually want them behave like any other value, and that’s why isequal exists and is used by Dict. isequal(NaN, NaN) returns true.

In this case though, that’s not what makes the difference. The code could have as well done [findfirst(==(i), v) for i in v2] and behaved the same. The difference here is that you were comparing x to the whole of v, which is obviously false, whereas here we’re iterating over v and comparing x to its individual values (strings).