How would I check for unique values across many arrays without for loops?

Julia1 · May 28, 2020, 2:14pm

Hi how’s it going?

Say I have 3 arrays:

X = [1,2,3,4,5]
Y = [1,2,3,4]
Z = [1,2,3,4]

I want to return that of the 5 unique values across X, Y and Z, only the number 5 is truly unique, because it only occurs in 1 array.

Now obviously with 3 lists there are many ways to do this, with the simplest and most naive being “For val in x, if val not in y and z…” etc.

But say I have hundreds of arrays and I want to check this. What would be the best way to implement?

rdeits · May 28, 2020, 2:20pm

without for loops?

By the way, there is absolutely no reason to avoid loops in Julia unless doing so helps you write clearer or easier to understand code. Loops in Julia are fast, and a well-written loop is usually the fastest way to solve a particular problem.

pdeffebach · May 28, 2020, 2:32pm

You want to use countmap. Unfortunately it doesn’t seem to accept generic iterators, so you can’t use Iterators.flatten.

You can do

array_of_arrays = [X, Y, Z]
countmap(reduce(vcat, array_of_arrays))

Then you can loop through the Dict and keep just the things with the value 1.

Julia1 · June 2, 2020, 4:10pm

thank you very much!

sijo · June 2, 2020, 4:29pm

Is this not a bit of an exaggeration? I think I saw several comments/issues where people where getting better performance with vectorized code, mostly due to how particular packages are implemented. I think one case was with Distributions.jl, maybe in relation with Turing.jl, but I can’t find it again. Another came up recently here: Speed of vectorized vs for-loops using Zygote - #3 by ChrisRackauckas .

kristoffer.carlsson · June 2, 2020, 4:32pm

If Zygote is slow on loops but not broadcast then that seems like an issue particular to Zygote and not something that should be applied as a guideline to general Julia code (unless you are writing it specifically to be ADed by Zygote).

sijo · June 2, 2020, 5:12pm

Agreed, I just thought “absolutely no reason” might be a bit strong. People might feel misled if we say that, and then when working on a particular problem they are told that a big slowdown is to be expected with for loops for a well-known package. After all, most serious code will use third-party packages…

harven · June 2, 2020, 5:21pm

counter from the DataStructures library accepts generators so you can bypass the creation of an intermediate array as follows.

 julia> import DataStructures: counter

 julia> sole(v...) = [first(p) for p in counter(x for a in v for x in a) if last(p) == 1]

 julia> sole([1:5;], [1:4;], [0:6;])
 2-element Array{Int64,1}:
  0
  6

Arguably such an algorithm keeps counting after encountering an element twice, which may not be optimal.

Julia used to have an hist function for computing frequencies but it was removed for some reason.

Topic		Replies	Views
Unique! and count New to Julia	5	748	December 22, 2021
Number of each unique value in an array General Usage	4	5302	March 26, 2024
Opposite of unique New to Julia sets	19	2401	March 25, 2021
Count occurances for matrix rows (where column order does not matter) General Usage question , count	30	864	December 13, 2022
Help porting Java code to Julia New to Julia question	10	978	September 1, 2019

How would I check for unique values across many arrays without for loops?

Related topics