Julian way of MATLAB ismember function?

leejm516 · May 16, 2017, 8:44am

Let two arrays a = ["1", "4", "7"] and b = ["1","2","3,"4","5"].

In MATLAB, [Lia,Locb] = ismember(a, b) will lead to

Lia = [1, 1, 0]
Locb = [1, 4, 0]

Quickly looking over docs, I coudn’t find a Julian equivalent of ismember.
Instead I can think two alternative ways.

# Way 1: Using comprehending lists
Lia = [each_item in b for each_item in a]
Locb = [findfirst(b, each_item) for each_item in a]

# Way 2: Vectorization (edited)
Lia = broadcast(x -> in(x, b), a)
Locb = broadcast(x -> findfirst(b, x), a)

Any better ways?

GunnarFarneback · May 16, 2017, 10:15am

If you want it to be short you can do

Lia = in.(a, [b])
LocB = findfirst.([b], a)

In practice I’d look at what the results of ismember are being used for. It’s quite possible that in Julia you wouldn’t need to form those vectors explicitly.

wonjinChoi · May 16, 2017, 10:18am

I found that there is an indexin function. It’s not exactly same as matlab, but…

indexin(a, b)

  Returns a vector containing the highest index in b for each value in a that is a member of b . The output vector
  contains 0 wherever a is not a member of b.

  julia> a = ['a', 'b', 'c', 'b', 'd', 'a'];

  julia> b = ['a','b','c'];

  julia> indexin(a,b)
  6-element Array{Int64,1}:
   1
   2
   3
   2
   0
   1

  julia> indexin(b,a)
  3-element Array{Int64,1}:
   6
   4
   3

Dan · May 16, 2017, 10:26am

As wonjinChoi said, indexin should do the heavy lifting. And Lia can be calculated from LocB to save time. In code:
LocB = indexin(a,b) Lia = 1.*(LocB .!= 0)

this gives the following:

julia> a = ['1', '4', '7']; b = ['1','2','3','4','5'];

julia> LocB = indexin(a,b)
3-element Array{Int64,1}:
 1
 4
 0

julia> Lia = 1.*(LocB .!= 0)
3-element Array{Int64,1}:
 1
 1
 0

leejm516 · May 16, 2017, 11:38pm

Thank you for all suggestions.

@GunnarFarneback I’m porting a MATLAB toolbox (COBRA Toolbox; https://github.com/opencobra/cobratoolbox/) into Julia as a practice. The whole story is too long… In this toolbox, this function is employed to check whether a newly added row(s) (metabolites) already exists in the sparse matrix.

colintbowers · May 17, 2017, 11:09am

Is it just by accident that your two input arrays in your example are sorted? If you can guarantee sort order of your inputs then typically you can solve this problem much faster that using in, indexin, or findfirst (or ismember in Matlab). For the case where one of your input arrays is very small relative to the other you can use searchsorted on each element of the small array, so you get k calls to an O(ln(n)) algorithm. For the case where both input arrays are of similar size you can use a simultaneous single loop over both arrays, hence the algorithm is O(n). There are probably several examples of this algorithm online but if you want one in Julia I have it so let me know and I’ll post it here.

leejm516 · May 18, 2017, 2:48am

Ahh, that was just by accident. Elements of the array do not need to be sorted.
I learned some basics of algorithms and Big-Oh notations but have almost forgotten
If you would show me some examples, I’d really appreciate it.

colintbowers · May 18, 2017, 3:48am

There’s no way I’ll ever manage to give a better explanation of Big-Oh notation than you’ll find here:

FYI, given that you can’t guarantee sort order of inputs, the answers provided by others here are as good as you can get (they’ll run in O(n*ln(n)))

Cheers,

Colin

Caesar_KG · September 7, 2018, 3:56am

Actually, indexin(a, b) return the highest index in b for each value in a that is a member of b. while ismember in Matlab return the lowest index, which may cause different results unintended.

PearGod · January 3, 2019, 8:20pm

If using Lia = in.(a, [b]), does it use TimSort to keep within O(n) if the vectors are already sorted? Is TimSort the “default” algorithm for vector methods that require sorting?

(Note that searchsorted doesn’t handle searching for members of a vector within another vector in the same initially requested.)

bennedich · January 3, 2019, 9:22pm

There’s no sorting taking place. Given vectors a and b,

in.(a, [b])

is equivalent to

for x = a
    in(x, b)
end

which in turn is equivalent to

for x = a
    in = false
    for y = b
        (in = x == y) && break
    end
end

In other words, the complexity is O(nm) where n is the length of vector a and m is the length of vector b.

As stated above, if the vectors are already sorted, you can do much better, but as far as I know there’s no built-in support for that in Julia, in which case you’d have to implement it yourself or look for an existing package.

PearGod · January 5, 2019, 12:12am

Thanks for the response! And the really clear explanation of in.(a,[b]) and how it breaks down to in().

Amin_Yahyaabadi · June 16, 2019, 10:42pm

For strings arrays use the following:

 A= ["Hey","Hello","Hi"];
 B= ["Hey","Hi","Goodbye","Bye"];
boolianIndex=occursin.(A,B);

johann.spies · June 20, 2019, 10:20am

This does not work in version 1.1.0

julia> A= ["Hey","Hello","Hi"];

julia> B= ["Hey","Hi","Goodbye","Bye"];

julia> boolianIndex=occursin.(A,B); 
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
 [1] _bcs1 at ./broadcast.jl:438 [inlined]
 [2] _bcs at ./broadcast.jl:432 [inlined]
 [3] broadcast_shape at ./broadcast.jl:426 [inlined]
 [4] combine_axes at ./broadcast.jl:421 [inlined]
 [5] instantiate at ./broadcast.jl:255 [inlined]
 [6] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(occursin),Tuple{Array{String,1},Array{String,1}}}) at ./broadcast.jl:753
 [7] top-level scope at none:0

BeastyBlacksmith · June 20, 2019, 12:43pm

Thats correct, occursin is for checking whether one string occurs in another string, like occursin("Hey", "HeyHo").
If you want to check whether the strings in Array A are in Array B, you can still use in.(A, Ref(B)).
Or if you want the index, indexin(A,B).

Amin_Yahyaabadi · June 20, 2019, 12:52pm

Oh you are right, I wrote this example without testing it.
One of the matrices should be only one string.
Try the following:

A= ["Hey"];
B= ["Hey","Hi","Goodbye","Bye"];
boolianIndex=occursin.(A,B);

Thank you! in.(A, Ref(B)) works perfectly.

mbauman · June 20, 2019, 2:24pm

If B is more than a handful of elements, I highly recommend:

in.(A, Ref(Set(B)))

as that’ll enable quick lookups of elements in B instead of repeatedly linearly scanning through the entire thing.

Topic		Replies	Views
Julia Equivalent for ismember in MATLAB for rows New to Julia	14	1829	March 30, 2020
Some_thing in some_list New to Julia question	11	489	January 29, 2024
Create a boolean mask between arrays of different size General Usage arrays , boolean	10	132	October 30, 2024
Indexin() for two identical vectors New to Julia	2	484	August 13, 2018
Indices of intersection of two arrays General Usage arrays	25	10736	April 19, 2024

Julian way of MATLAB ismember function?

Related topics