Compare array (of String) with no regard to the order

Le’t have 2 arrays (of String, but it doesn’t really matter). I need to compare them, but the order is irrelevant, just that they contain the same Strings/items (and are of the same length, obviously).

i.e. comparison of

x1 = [“a”, “b”, “c”]
x2 = [“c”, “b”, “a”]

should return true (equality).

What would be the fastest, most efficient way to do that?

Thank you.

The fastest way is probably to convert them to key => count Dicts and compare those:

function counter(it)
    y = Dict{eltype(it), Int}()
    for i in it
        y[i] = get(y, i, 0) + 1
    y
end
length(x1) == length(x2) && counter(x1) == counter(x2)

If you need to do this comparison multiple times, you can instead keep the arrays sorted, and compare them directly.

2 Likes
x1 = ["a", "b", "c"]
x2 = ["c", "b", "a"]

using StatsBase: countmap
countmap(x1) == countmap(x2)
2 Likes

If you don’t care about duplicates you can use issetequal:

julia> x1 = ["a", "b", "c"];

julia> x2 = ["c", "b", "a"];

julia> x3 = ["a", "b", "c", "a"]; # duplicates

julia> issetequal(x1, x2)
true

julia> issetequal(x1, x3)
true
6 Likes

i think he cares

Thank you, everyone.
I actually don’t envision in my code any possibility of a duplicate in either array.
Here are interesting efficiency test results of suggested solutions:

@time counter(x1) == counter(x2)
  0.000129 seconds (8 allocations: 1.188 KiB)
true

@time countmap(x1) == countmap(x2)
  0.000082 seconds (8 allocations: 1.188 KiB)
true

@time issetequal(x1, x2)
  0.000009 seconds (8 allocations: 960 bytes)
true
2 Likes

One thing to consider is if you won’t have duplicates and don’t care about order, you should probably be using a set instead of a vector.

2 Likes

As far as I can tell, the most efficient solution, at least for short collections, is:

sort(x1) == sort(x2)
3 Likes

For everyone having the same problem as in this quite old question:

If order really does not matter, even for the surrounding code, the even faster

sort!(x1) == sort!(x2)

might be an option, which obviously modifies both x1 and x2.

1 Like