Create a boolean mask between arrays of different size

Hi,

I would like to do the same thing as in python:

import numpy as np

a=[1,2,3,4,5,6,7]
b=[2,4,6]

mask=np.isin(a,b)

giving

>>> mask
array([False, True, False, True, False, True, False])

In julia I can:

mask = indexin(a,b)
7-element Vector{Union{Nothing, Int64}}:
  nothing
 1
  nothing
 2
  nothing
 3
  nothing

But I guess that there must be a way to get a simpler boolean array .

One way is

[x in b for x in a]

or shorter with dot broadcasting

in.(a, Ref(b))
2 Likes

Thanks !!!

In your example, a and b are both sorted. That permits a fast implementation that scans over both.

If that applies to your true problem, then you should consider doing that instead of the one-liner.

I don’t use sorted array/matrix here.

I tried both solutions, they are very slow compared to python.
a have around 25 million elements.
b have 200 000 elements.

I think the best solutions depends on what are you eventually doing. Can you show some minimal example of what you’re benchmarking exactly in Numpy?

Oh it’s quite simple, that’s two arrays of floats.

In [1]: import numpy as np

In [2]: a = np.random.rand(25*10**6)

In [3]: b = np.random.rand(2*10**5)

In [4]: %timeit np.isin(a,b)
3.88 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

like so?

Also, I was asking what do you do after you get the mask – it may not be the best thing to do depends on what your ultimate goal is.

or probably they sort b internally:

In [1]: import numpy as np

In [2]: a = np.random.rand(25*10**6)

In [3]: b = np.random.rand(2*10**5)

In [4]: %timeit np.isin(a,b)
3.88 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edit: they sort it:

in Julia you can also just use insorted(), notice the time includes the time to sort b

julia> @be insorted.(a, Ref(sort(b))) evals=10
Benchmark: 1 sample with 10 evaluations
        2.566 s (13 allocs: 6.040 MiB, 0.02% gc time)
2 Likes

Ah thanks for the information, it’s indeed way faster.

An alternative, which is much faster on my machine:

in.(a, Ref(Set(b)))
3 Likes