How would I check for unique values across many arrays without for loops?

Hi how’s it going?

Say I have 3 arrays:

X = [1,2,3,4,5]
Y = [1,2,3,4]
Z = [1,2,3,4]

I want to return that of the 5 unique values across X, Y and Z, only the number 5 is truly unique, because it only occurs in 1 array.

Now obviously with 3 lists there are many ways to do this, with the simplest and most naive being “For val in x, if val not in y and z…” etc.

But say I have hundreds of arrays and I want to check this. What would be the best way to implement?

without for loops?

By the way, there is absolutely no reason to avoid loops in Julia unless doing so helps you write clearer or easier to understand code. Loops in Julia are fast, and a well-written loop is usually the fastest way to solve a particular problem.

3 Likes

You want to use countmap. Unfortunately it doesn’t seem to accept generic iterators, so you can’t use Iterators.flatten.

You can do

array_of_arrays = [X, Y, Z]
countmap(reduce(vcat, array_of_arrays))

Then you can loop through the Dict and keep just the things with the value 1.

2 Likes

thank you very much!

Is this not a bit of an exaggeration? I think I saw several comments/issues where people where getting better performance with vectorized code, mostly due to how particular packages are implemented. I think one case was with Distributions.jl, maybe in relation with Turing.jl, but I can’t find it again. Another came up recently here: Speed of vectorized vs for-loops using Zygote - #3 by ChrisRackauckas .

If Zygote is slow on loops but not broadcast then that seems like an issue particular to Zygote and not something that should be applied as a guideline to general Julia code (unless you are writing it specifically to be ADed by Zygote).

Agreed, I just thought “absolutely no reason” might be a bit strong. People might feel misled if we say that, and then when working on a particular problem they are told that a big slowdown is to be expected with for loops for a well-known package. After all, most serious code will use third-party packages…

counter from the DataStructures library accepts generators so you can bypass the creation of an intermediate array as follows.

 julia> import DataStructures: counter

 julia> sole(v...) = [first(p) for p in counter(x for a in v for x in a) if last(p) == 1]

 julia> sole([1:5;], [1:4;], [0:6;])
 2-element Array{Int64,1}:
  0
  6

Arguably such an algorithm keeps counting after encountering an element twice, which may not be optimal.

Julia used to have an hist function for computing frequencies but it was removed for some reason.