Let two arrays a = ["1", "4", "7"] and b = ["1","2","3,"4","5"].
In MATLAB, [Lia,Locb] = ismember(a, b) will lead to
Lia = [1, 1, 0]
Locb = [1, 4, 0]
Quickly looking over docs, I coudn’t find a Julian equivalent of ismember.
Instead I can think two alternative ways.
# Way 1: Using comprehending lists
Lia = [each_item in b for each_item in a]
Locb = [findfirst(b, each_item) for each_item in a]
# Way 2: Vectorization (edited)
Lia = broadcast(x -> in(x, b), a)
Locb = broadcast(x -> findfirst(b, x), a)
In practice I’d look at what the results of ismember are being used for. It’s quite possible that in Julia you wouldn’t need to form those vectors explicitly.
I found that there is an indexin function. It’s not exactly same as matlab, but…
indexin(a, b)
Returns a vector containing the highest index in b for each value in a that is a member of b . The output vector
contains 0 wherever a is not a member of b.
julia> a = ['a', 'b', 'c', 'b', 'd', 'a'];
julia> b = ['a','b','c'];
julia> indexin(a,b)
6-element Array{Int64,1}:
1
2
3
2
0
1
julia> indexin(b,a)
3-element Array{Int64,1}:
6
4
3
As wonjinChoi said, indexin should do the heavy lifting. And Lia can be calculated from LocB to save time. In code: LocB = indexin(a,b) Lia = 1.*(LocB .!= 0)
this gives the following:
julia> a = ['1', '4', '7']; b = ['1','2','3','4','5'];
julia> LocB = indexin(a,b)
3-element Array{Int64,1}:
1
4
0
julia> Lia = 1.*(LocB .!= 0)
3-element Array{Int64,1}:
1
1
0
@GunnarFarneback I’m porting a MATLAB toolbox (COBRA Toolbox; https://github.com/opencobra/cobratoolbox/) into Julia as a practice. The whole story is too long… In this toolbox, this function is employed to check whether a newly added row(s) (metabolites) already exists in the sparse matrix.
Is it just by accident that your two input arrays in your example are sorted? If you can guarantee sort order of your inputs then typically you can solve this problem much faster that using in, indexin, or findfirst (or ismember in Matlab). For the case where one of your input arrays is very small relative to the other you can use searchsorted on each element of the small array, so you get k calls to an O(ln(n)) algorithm. For the case where both input arrays are of similar size you can use a simultaneous single loop over both arrays, hence the algorithm is O(n). There are probably several examples of this algorithm online but if you want one in Julia I have it so let me know and I’ll post it here.
Ahh, that was just by accident. Elements of the array do not need to be sorted.
I learned some basics of algorithms and Big-Oh notations but have almost forgotten
If you would show me some examples, I’d really appreciate it.
Actually, indexin(a, b) return the highest index in b for each value in a that is a member of b. while ismember in Matlab return the lowest index, which may cause different results unintended.
If using Lia = in.(a, [b]), does it use TimSort to keep within O(n) if the vectors are already sorted? Is TimSort the “default” algorithm for vector methods that require sorting?
(Note that searchsorted doesn’t handle searching for members of a vector within another vector in the same initially requested.)
There’s no sorting taking place. Given vectors a and b,
in.(a, [b])
is equivalent to
for x = a
in(x, b)
end
which in turn is equivalent to
for x = a
in = false
for y = b
(in = x == y) && break
end
end
In other words, the complexity is O(nm) where n is the length of vector a and m is the length of vector b.
As stated above, if the vectors are already sorted, you can do much better, but as far as I know there’s no built-in support for that in Julia, in which case you’d have to implement it yourself or look for an existing package.
julia> A= ["Hey","Hello","Hi"];
julia> B= ["Hey","Hi","Goodbye","Bye"];
julia> boolianIndex=occursin.(A,B);
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
[1] _bcs1 at ./broadcast.jl:438 [inlined]
[2] _bcs at ./broadcast.jl:432 [inlined]
[3] broadcast_shape at ./broadcast.jl:426 [inlined]
[4] combine_axes at ./broadcast.jl:421 [inlined]
[5] instantiate at ./broadcast.jl:255 [inlined]
[6] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(occursin),Tuple{Array{String,1},Array{String,1}}}) at ./broadcast.jl:753
[7] top-level scope at none:0
Thats correct, occursin is for checking whether one string occurs in another string, like occursin("Hey", "HeyHo").
If you want to check whether the strings in Array A are in Array B, you can still use in.(A, Ref(B)).
Or if you want the index, indexin(A,B).