Julian way of MATLAB ismember function?


#1

https://www.mathworks.com/help/matlab/ref/ismember.html

Let two arrays a = ["1", "4", "7"] and b = ["1","2","3,"4","5"].

In MATLAB, [Lia,Locb] = ismember(a, b) will lead to

Lia = [1, 1, 0]
Locb = [1, 4, 0]

Quickly looking over docs, I coudn’t find a Julian equivalent of ismember.
Instead I can think two alternative ways.

# Way 1: Using comprehending lists
Lia = [each_item in b for each_item in a]
Locb = [findfirst(b, each_item) for each_item in a]

# Way 2: Vectorization (edited)
Lia = broadcast(x -> in(x, b), a)
Locb = broadcast(x -> findfirst(b, x), a)

Any better ways?


#3

If you want it to be short you can do

Lia = in.(a, [b])
LocB = findfirst.([b], a)

In practice I’d look at what the results of ismember are being used for. It’s quite possible that in Julia you wouldn’t need to form those vectors explicitly.


#4

I found that there is an indexin function. It’s not exactly same as matlab, but…

indexin(a, b)

  Returns a vector containing the highest index in b for each value in a that is a member of b . The output vector
  contains 0 wherever a is not a member of b.

  julia> a = ['a', 'b', 'c', 'b', 'd', 'a'];

  julia> b = ['a','b','c'];

  julia> indexin(a,b)
  6-element Array{Int64,1}:
   1
   2
   3
   2
   0
   1

  julia> indexin(b,a)
  3-element Array{Int64,1}:
   6
   4
   3

#5

As wonjinChoi said, indexin should do the heavy lifting. And Lia can be calculated from LocB to save time. In code:
LocB = indexin(a,b) Lia = 1.*(LocB .!= 0)

this gives the following:

julia> a = ['1', '4', '7']; b = ['1','2','3','4','5'];

julia> LocB = indexin(a,b)
3-element Array{Int64,1}:
 1
 4
 0

julia> Lia = 1.*(LocB .!= 0)
3-element Array{Int64,1}:
 1
 1
 0

#6

Thank you for all suggestions.

@GunnarFarneback I’m porting a MATLAB toolbox (COBRA Toolbox; https://github.com/opencobra/cobratoolbox/) into Julia as a practice. The whole story is too long… In this toolbox, this function is employed to check whether a newly added row(s) (metabolites) already exists in the sparse matrix.


#7

Is it just by accident that your two input arrays in your example are sorted? If you can guarantee sort order of your inputs then typically you can solve this problem much faster that using in, indexin, or findfirst (or ismember in Matlab). For the case where one of your input arrays is very small relative to the other you can use searchsorted on each element of the small array, so you get k calls to an O(ln(n)) algorithm. For the case where both input arrays are of similar size you can use a simultaneous single loop over both arrays, hence the algorithm is O(n). There are probably several examples of this algorithm online but if you want one in Julia I have it so let me know and I’ll post it here.


#8

Ahh, that was just by accident. Elements of the array do not need to be sorted.
I learned some basics of algorithms and Big-Oh notations but have almost forgotten :slight_smile:
If you would show me some examples, I’d really appreciate it.


#9

There’s no way I’ll ever manage to give a better explanation of Big-Oh notation than you’ll find here:

FYI, given that you can’t guarantee sort order of inputs, the answers provided by others here are as good as you can get (they’ll run in O(n*ln(n)))

Cheers,

Colin


#10

Actually, indexin(a, b) return the highest index in b for each value in a that is a member of b. while ismember in Matlab return the lowest index, which may cause different results unintended.