myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])
Now trying to find the position of :b elements in certain vector and make a new column out of it:
myd = transform(myd, :b => (v -> findfirst(x -> x==v, ["saba","daba","myy","aba","oba"])) => :sort_order)
I end up with column of nothing.
It’s because your function is wrongly specified: You check if x == v
where x
is an element in the vector v
.
You could do
v2 = ["saba","daba","myy","aba","oba"]
transform(myd, :b => v -> [findfirst(isequal(i), v) for i in v2])
To be more efficient, you should probably create a Dict{String, Int}
that maps strings to indices, then use that to look up. Something like
d = Dict(s => i for (i, s) in enumerate(v2))
myd.pos = [d[s] for s in myd.b]
1 Like
@Mastomaki, what result do you expect?
There should be a new column :sort_order with values 4,1,2 because .e.g “aba” is the 4th element in
[“saba”,“daba”,“myy”,“aba”,“oba”].
So you already have one working answer above.
I would do:
v = ["saba","daba","myy","aba","oba"]
myd.sortorder = [findfirst(==(x), v) for x in myd.b]
1 Like
myd = DataFrame(a = 1:3, b = ["aba","saba","daba" ])
df = DataFrame(b = ["saba","daba","myy","aba","oba"], sort_order = 1:5)
leftjoin!(myd, df, on = :b)
2 Likes
So what is the difference between isequal() and == ?
@Mastomaki
From the documentation:
isequal(x, y)
Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN
values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. Always returns a Bool value.
Basically, isequal
is what you would intutively expect equality to be. For numeric purposes or data analysis purposes, NaN
, 0.0 and -0.0 (which are different bit patterns), and missing
have specific defined behavior that makes them behave differently from other values. ==
follows that defined behaviour (NaN == NaN
is false, for eg.), which is useful for those purposes.
But for other purposes, like using them as keys in a Dict
, you usually want them behave like any other value, and that’s why isequal
exists and is used by Dict
. isequal(NaN, NaN)
returns true.
In this case though, that’s not what makes the difference. The code could have as well done [findfirst(==(i), v) for i in v2]
and behaved the same. The difference here is that you were comparing x to the whole of v, which is obviously false, whereas here we’re iterating over v and comparing x to its individual values (strings).