# Find position on list (fast)

Me vectors are very long and I am looking for efficient way to :
If
x=[“a”;“b”;“c”;“d”;“e”]
and
y=[“b”; “a”; “a”; “c”]

How to find , but very fast, position any y on x
expected result:

2
1
1
3

Thanks Pauls

Well, very fast is relative. The straightforward solution isn’t shabby ``````julia> using BenchmarkTools

julia> f1(x,y) = [findfirst(isequal(el), x) for el in y]
f1 (generic function with 1 method)

julia> @btime f1(\$x,\$y);
108.864 ns (1 allocation: 96 bytes)
``````

Depending on how much you (over-)simplified your MWE, you could even use `Char`s instead of `String`s to get a free speedup ``````julia> xc = collect('a':'e');

julia> yc = ['b', 'a','a', 'c'];

julia> @btime f1(\$xc,\$yc);
71.986 ns (1 allocation: 96 bytes)
``````

BTW, a similar alternative is

``````julia> f2(x,y) = [searchsortedfirst(x, el) for el in y]
f2 (generic function with 1 method)

julia> @btime f2(\$x,\$y);
146.619 ns (1 allocation: 96 bytes)

julia> @btime f2(\$xc,\$yc);
71.476 ns (1 allocation: 96 bytes)
``````
3 Likes

In case the MWE is oversimplified and the vectors are really big, I found a q&a on stackoverflow.com. And then there is certainly more of this.

If `x` is very large it will be more efficient to create a `Dict` mapping each element of `x` to its index.

4 Likes

Just to demonstrate @cjdoris’s great point:

``````julia> x = string.(Char.(1:1000));

julia> d = Dict(zip(x,1:length(x)));

julia> @btime f1(\$x,\$y);
2.621 μs (1 allocation: 96 bytes)

julia> f3(d,y) = [d[el] for el in y]
f3 (generic function with 1 method)

julia> @btime f3(\$d,\$y);
170.077 ns (1 allocation: 96 bytes)
``````

Constant lookup is awesome 1 Like

Remember that Julia has a built-in function for that, it is called `indexin`:

``````julia> x = ["a", "b", "c", "d", "e"]
5-element Vector{String}:
"a"
"b"
"c"
"d"
"e"

julia> y = ["b", "a", "a", "c"]
4-element Vector{String}:
"b"
"a"
"a"
"c"

julia> indexin(y, x)
4-element Vector{Union{Nothing, Int64}}:
2
1
1
3
``````

It may not be the fastest, but clean and easy to use.

7 Likes

If it’s not the fastest, file an issue (or make a PR). It should be.

2 Likes

How could I forget about it For comparison with the other variants (on my machine):

``````julia> f4(x,y) = indexin(y, x)
f4 (generic function with 1 method)

julia> @btime f4(\$x,\$y);
491.732 ns (5 allocations: 624 bytes)
``````

Definitely not the fastest!

1 Like

It does basically the same thing as `f3`. The big difference is that your timing of `f3` didn’t include the time to make the dictionary. (also the way `indexin` creates the dictionary isn’t especially efficient).

3 Likes

Fair enough. So it’s prioritizing the case of a large haystack. Would it perhaps make sense to decide on a strategy based on the length of the latter, i.e. use a polyalgorithm? Or do we just accept it (and only optimize the dict creation im possible)?

Probably worth a PR.

FWIW, for the inputs:

``````x = string.(Char.(1:1000))
y = string.(rand(Char.(1:1000), 500))
``````

@carstenbauer’s `f3(x,y)`, with the dictionary built-in, takes about 2/3 of the time of Base’s `indexin(x,y)`

2 Likes

If these are operations on big arrays and performance matters, you should really think about changing to a data structure that supports more efficient searching. Otherwise you are stuck with O(length(x)×length(y)) complexity. Like:

But ideally you should start with a data structure that supports the operations you want quickly, rather than starting with some other data structure and then converting (or constructing auxiliary dictionaries).

2 Likes

Not a fair comparison, because the `indexin` function creates a dictionary mapping elements to indices, so when you apply `@btime` to `indexin` you are including the time to create the dictionary. In contrast, in your `f3` benchmark you did not include the time to construct the `Dict`.

(Note also that `indexin` returns an array of `Union{Nothing,Int}` since it allows for the possibility that an element is missing from `x`, whereas your `f3` code threw an error in this case, and hence didn’t have to pay the price of a type-unstable array.)

2 Likes