Just a real bone question regarding DataFrames and how to create a new column containing integers to represent each unique string in another column. Sorry if my example is a bit rubbish, I’m new to Julia and not the best coder to begin with… Anyway, I have a DataFrame with a String31 column containing 53 unique site names. I am attempting to create another column containing integers (1:53), with each integer representing each unique site, if that makes sense. As an example (I don’t know how to add code):
test[:, :site_int] = map(test[:, :site]) do b
if b == “site_one”
1
elseif b == “site_two”
2
elseif b == “site_three”
3
else
missing
end
end
And it seems to work, but I can’t work out how to loop it/vectorise it to number each of the 53 sites; I am guessing there is a more sophisticated way of achieving this than just typing out each site name and assigning a value.
I hope this makes sense and someone could give me a hand in working it out. Any help greatly appreciated.
Cheers
Gregg
Sorry, I forgot to give an example of what I want:
by default levels are sorted in ascending order (but you can use any order you like - see the levels! function).
It is likely that you actually want CategoricalVector created with categorical(test.site) and not an integer code. What do you want to use these integer codes for?
If you just want integer indices (and do not need/like CategoricalArrays.jl funtionality) you can do:
Hi, that’s boss, cheers man, Yeah, the site names are not ordinal, I just want to represent them with a number. I am trying to recode some of my old hierarchical models from R in Julia/Turing. This model, in R, is a GLMM with site as a random effect, though when I try it using Turing it can’t check the index bounds of strings; I assumed it required an integer value, and it does work when I use such a format.
@spk - please update your DataFrames.jl installation. You must be on some old version of the package. If you used DataFrames.jl 1.4 release it would work.