# Julian way of MATLAB ismember function?

https://www.mathworks.com/help/matlab/ref/ismember.html

Let two arrays `a = ["1", "4", "7"]` and `b = ["1","2","3,"4","5"]`.

In MATLAB, `[Lia,Locb] = ismember(a, b)` will lead to

``````Lia = [1, 1, 0]
Locb = [1, 4, 0]
``````

Quickly looking over docs, I coudn’t find a Julian equivalent of `ismember`.
Instead I can think two alternative ways.

``````# Way 1: Using comprehending lists
Lia = [each_item in b for each_item in a]
Locb = [findfirst(b, each_item) for each_item in a]

# Way 2: Vectorization (edited)
Lia = broadcast(x -> in(x, b), a)
Locb = broadcast(x -> findfirst(b, x), a)
``````

Any better ways?

1 Like

If you want it to be short you can do

``````Lia = in.(a, [b])
LocB = findfirst.([b], a)
``````

In practice I’d look at what the results of `ismember` are being used for. It’s quite possible that in Julia you wouldn’t need to form those vectors explicitly.

4 Likes

I found that there is an `indexin` function. It’s not exactly same as matlab, but…

``````indexin(a, b)

Returns a vector containing the highest index in b for each value in a that is a member of b . The output vector
contains 0 wherever a is not a member of b.

julia> a = ['a', 'b', 'c', 'b', 'd', 'a'];

julia> b = ['a','b','c'];

julia> indexin(a,b)
6-element Array{Int64,1}:
1
2
3
2
0
1

julia> indexin(b,a)
3-element Array{Int64,1}:
6
4
3
``````

As wonjinChoi said, `indexin` should do the heavy lifting. And `Lia` can be calculated from `LocB` to save time. In code:
`LocB = indexin(a,b) Lia = 1.*(LocB .!= 0)`

this gives the following:

``````julia> a = ['1', '4', '7']; b = ['1','2','3','4','5'];

julia> LocB = indexin(a,b)
3-element Array{Int64,1}:
1
4
0

julia> Lia = 1.*(LocB .!= 0)
3-element Array{Int64,1}:
1
1
0
``````

Thank you for all suggestions.

@GunnarFarneback I’m porting a MATLAB toolbox (COBRA Toolbox; https://github.com/opencobra/cobratoolbox/) into Julia as a practice. The whole story is too long… In this toolbox, this function is employed to check whether a newly added row(s) (metabolites) already exists in the sparse matrix.

Is it just by accident that your two input arrays in your example are sorted? If you can guarantee sort order of your inputs then typically you can solve this problem much faster that using `in`, `indexin`, or `findfirst` (or `ismember` in Matlab). For the case where one of your input arrays is very small relative to the other you can use `searchsorted` on each element of the small array, so you get `k` calls to an O(ln(n)) algorithm. For the case where both input arrays are of similar size you can use a simultaneous single loop over both arrays, hence the algorithm is O(n). There are probably several examples of this algorithm online but if you want one in Julia I have it so let me know and I’ll post it here.

Ahh, that was just by accident. Elements of the array do not need to be sorted.
I learned some basics of algorithms and Big-Oh notations but have almost forgotten If you would show me some examples, I’d really appreciate it.

There’s no way I’ll ever manage to give a better explanation of Big-Oh notation than you’ll find here:

FYI, given that you can’t guarantee sort order of inputs, the answers provided by others here are as good as you can get (they’ll run in O(n*ln(n)))

Cheers,

Colin

Actually, indexin(a, b) return the highest index in b for each value in a that is a member of b. while ismember in Matlab return the lowest index, which may cause different results unintended.

If using Lia = in.(a, [b]), does it use TimSort to keep within O(n) if the vectors are already sorted? Is TimSort the “default” algorithm for vector methods that require sorting?

(Note that searchsorted doesn’t handle searching for members of a vector within another vector in the same initially requested.)

There’s no sorting taking place. Given vectors `a` and `b`,

``````in.(a, [b])
``````

is equivalent to

``````for x = a
in(x, b)
end
``````

which in turn is equivalent to

``````for x = a
in = false
for y = b
(in = x == y) && break
end
end
``````

In other words, the complexity is `O(nm)` where `n` is the length of vector `a` and `m` is the length of vector `b`.

As stated above, if the vectors are already sorted, you can do much better, but as far as I know there’s no built-in support for that in Julia, in which case you’d have to implement it yourself or look for an existing package.

2 Likes

Thanks for the response! And the really clear explanation of in.(a,[b]) and how it breaks down to in().

For strings arrays use the following:

`````` A= ["Hey","Hello","Hi"];
B= ["Hey","Hi","Goodbye","Bye"];
boolianIndex=occursin.(A,B);

``````

This does not work in version 1.1.0

``````julia> A= ["Hey","Hello","Hi"];

julia> B= ["Hey","Hi","Goodbye","Bye"];

julia> boolianIndex=occursin.(A,B);
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
 top-level scope at none:0
``````
1 Like

Thats correct, `occursin` is for checking whether one string occurs in another string, like `occursin("Hey", "HeyHo")`.
If you want to check whether the strings in Array `A` are in Array `B`, you can still use `in.(A, Ref(B))`.
Or if you want the index, `indexin(A,B)`.

2 Likes

Oh you are right, I wrote this example without testing it.
One of the matrices should be only one string.
Try the following:

``````A= ["Hey"];
B= ["Hey","Hi","Goodbye","Bye"];
boolianIndex=occursin.(A,B);
``````

Thank you! `in.(A, Ref(B))` works perfectly.

If `B` is more than a handful of elements, I highly recommend:

``````in.(A, Ref(Set(B)))
``````

as that’ll enable quick lookups of elements in `B` instead of repeatedly linearly scanning through the entire thing.

3 Likes