This function converts a categorical vector to an index:

```
function convert_factor_to_index(cat_vec::CategoricalVector)::Vector{Int}
levs = levels(cat_vec)
int_vec::Vector{Int} = findfirst.(isequal.(cat_vec), Ref(levs))
#! int_vec = findfirst.([isequal.(catv)[i].(levs) for i in 1:length(levs)])
return int_vec
end # convert_factor_to_index
```

An example:

```
julia> v = ["A", "B", "C"]
convert_factor_to_index(categorical(v))
3-element Vector{Int64}:
1
2
3
```

So the function works, but I am not clear why it needs the `Ref`

. (It does not work without it).

Also, it runs much slower with the following line of code:

```
int_vec = findfirst.([isequal.(catv)[i].(levs) for i in 1:length(levs)])
```

I understand broadcasting is fast, but does `Ref`

also helps to speed up things?

nilshg
June 14, 2023, 9:40am
2
Are you maybe looking for the `levelcode`

function in `CategoricalArrays`

?

```
julia> levelcode.(v)
3-element Vector{Int64}:
1
2
3
```

To your other question, no, `Ref`

is not speeding up anything, it’s simply required to protect `levs`

from broadcasting. You could have equally wrapped it in a single-argument `Tuple`

like `(levs, )`

.

3 Likes

HanD
June 14, 2023, 1:13pm
3
This doesn’t seem right:

```
int_vec = findfirst.([isequal.(catv)[i].(levs) for i in 1:length(levs)])
```

It’s as if you tried to replace broadcasting with list comprehension, but something went awry. I think you meant something like this:

```
int_vec = [findfirst([isequal(cat), levs) for cat in cat_vec]
```

At least that is what the initial broadcast is equivalent to. The `Ref`

merely protects the `levs`

argument to be broadcast.

Sifting through the available functions, I see that there are `pool()`

and `refs()`

to get the data of interest from a categorical vector

```
julia> using CategoricalArrays, BenchmarkTools
julia> function convert_factor_to_index(cat_vec::CategoricalVector)::Vector{Int}
levs = levels(cat_vec)
int_vec::Vector{Int} = findfirst.(isequal.(cat_vec), Ref(levs))
#! int_vec = findfirst.([isequal.(catv)[i].(levs) for i in 1:length(levs)])
return int_vec
end # convert_factor_to_index
convert_factor_to_index (generic function with 1 method)
julia> v = ["A", "B", "C"]
3-element Vector{String}:
"A"
"B"
"C"
julia> @btime convert_factor_to_index(categorical(v))
378.325 ns (12 allocations: 944 bytes)
3-element Vector{Int64}:
1
2
3
julia> @btime CategoricalArrays.refs(categorical($v))
285.036 ns (10 allocations: 848 bytes)
3-element Vector{UInt32}:
0x00000001
0x00000002
0x00000003
julia> @btime CategoricalArrays.pool(categorical($v))
286.594 ns (10 allocations: 848 bytes)
CategoricalPool{String, UInt32}(["A", "B", "C"])
```

The code I proposed works if one replaces ‘catv’ with ‘cat_vec’:

```
v = ["A", "B", "C"]
cat_vec = categorical(v)
levs = levels(cat_vec)
int_vec = findfirst.([isequal.(cat_vec)[i].(levs) for i in 1:length(levs)])
3-element Vector{Int64}:
1
2
3
```

That is what I needed. Thanks.

Soldalma:

`isequal.(cat_vec)[i]`

This is definitely less efficient than `isequal(cat_vec[i])`

.

1 Like