# Problem formulation

I have an array `old::Vector{T}` of somewhat complicated structs; I have an array `new::Vector{T}` of other complicated structs. I want an array `result::Vector{T}` which

• contains a single element for each `==` equivalence class for elements from `old` and `new`, and
• preserves the order, i.e. all elements from `old` proceed in `result` elements from `new`; I’d be happiest if the order of `old` is preserved.

### Potential solution

These requirements are realized e.g. by `unique!([old; new])`, however the problem with `unique!` is that true equality is rather expensive for structs of type `T`. Also note that usually `new` is much (30-100×) larger than `old`. In terms of sizes: `length(old) ~ 100_000`, `length(new)~10_000_000`;

### What have I tried

Solution I came up with so far:

• pre-hashing: compute some invariants for all objects from `old` and `new`, `hash` them, store hash in the objects (as a field of struct `T`). Then comparison of saved hashes is used to determine if objects are different; this pre-hashing can be threaded at will and does not blow up the size of those arrays (those invariants are too large to be stored).

Unfortunately (potential) hash collisions or equal objects are still making (single-threaded) `unique!` slow (on the positive side: my hashing of invariants is perfect so far

### I’m looking for

• a clever way to multithread `unique!`, or
• a different scheme to achieve similar `result`, or
• (my favourite) for someone to surprise me with a clever solution I’d never think of
1 Like

Did you define `==` yourself on these types? If so, or if you have the option to, I would try to make it cheaper (i.e. looking at fields that are likely to be different early in the comparison, comparing small fields first, etc.). Then, maybe this can help:

``````unique(Iterators.flatten((old, new)))
``````

In place `unique!` doesn’t seem to be that helpful in this case anyway, since the very act of `[old; new]` seems like an unavoidable allocation.

I did defined `==` for `T` and it’s a costly function; the point is that the `==` relation may identify objects which have completely different presentation, so it’s definitely not correct to compare fields.

Thanks for the tip with `unique(Iterators.flatten(...))`, but unfortunately it is slower than my use of `unique!`. Maybe a pseudocode will help to better illustrate the problem. Here `produce_new(x,y)` is a function that takes two structs and mixes them together to produce a new one.

``````function generate_thr(S::AbstractVector{T}, produce_new; n=4) where T
old = unique!([one(first(S)), S...])
sizes = [length(old)]
new = similar(old, 0)

for r in 2:n
resize!(new, (length(old)-1)*length(S))
for old_idx in 1:length(old)-1
b = old[old_idx+1]
g = produce_new(b, S[S_idx])
hash(g) # stores hash of invariants in g
new[length(S)*(old_idx-1)+S_idx] = g
end
end

append!(old, new)
resize!(new, 0) # to allow GC to free memory, if necessary
unique!(old)
# old = unique(Iterators.flatten((old, new)))
push!(sizes, length(old))
end
return old, sizes
end
``````

A simple way to relatively effectively multi thread this is to make n sets each containing 1/nth of the list, and compute their union.

I was thinking about partitioning `[old; new]` into more parts and computing `unions` in tree-like reduction (which would produce a nice threading opportunities with `Threads.@spawn`), but couldn’t come up with something working fast ;/ that said I don’t have much experience with the new threading interface…

1 Like

If I am not wrong, `unique!` is faster if the elements are sorted (even with the overhead of sorting, it is better to call `sort!` and then `unique!` than just `unique!`). This raises two questions:

• Can you define an ordering for your type? It does not need to make much semantic sense, only needs to be consistent and, preferably, fast to compute.
• Do you need the elements in both `Vectors` to stay in the order they already were?
• If you can sort each list individually, then you can combine them while removing duplicates in `O(n)`.
• that’s a good idea with sorting; I have an ordering, but it’s not consistent with `[old; new]` requirement; but maybe this could be tweaked…
• I’d prefer them to, but that’s not a strict requirement

following the `fib` example:

``````function my_unique(itr, threshold=256)
length(itr) < threshold && return unique!(itr)
k = length(itr)÷2
return union!(my_unique(itr[1:k], threshold), fetch(right))
end
``````

is this the pattern I should follow?

Use an `OrderedSet`, only insert new elements, collect when done.

also preserves order AFAIK and has a cool API to avoid hash table lookups twice.

1 Like

FYI ThreadsX.jl has `ThreadsX.unique`. It should work with many iterator transforms including `Iterator.flatten`.

Here is a comparison of `Base.unique` (`"base"`) and `ThreadsX.unique` (`"tx"`) ran on a two-core machine (some random machine on GitHub Actions):

ID time GC time memory allocations
`["unique", "rand(1:10, 1000000)", "base"]` 9.666 ms (5%) 832 bytes (1%) 8
`["unique", "rand(1:10, 1000000)", "tx"]` 5.080 ms (5%) 50.98 KiB (1%) 882
`["unique", "rand(1:1000, 1000000)", "base"]` 8.653 ms (5%) 65.95 KiB (1%) 27
`["unique", "rand(1:1000, 1000000)", "tx"]` 5.377 ms (5%) 1.07 MiB (1%) 1186

Here is the benchmark script: https://github.com/tkf/ThreadsX.jl/blob/7a98c2407a45a02d818c85729570e47c3dbccd8f/benchmark/bench_unique.jl

1 Like

That link doesn’t work for me (this one does), but probably the easiest way to run the benchmark:

``````julia> using ThreadsX, BenchmarkTools

julia> include(joinpath(pathof(ThreadsX), "..", "..", "benchmark", "bench_unique.jl"))
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"rand(1:10, 1000000)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"base" => Benchmark(evals=1, seconds=5.0, samples=10000)
"tx" => Benchmark(evals=1, seconds=5.0, samples=10000)
"rand(1:1000, 1000000)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"base" => Benchmark(evals=1, seconds=5.0, samples=10000)
"tx" => Benchmark(evals=1, seconds=5.0, samples=10000)

julia> run(ans)
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"rand(1:10, 1000000)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"base" => Trial(6.119 ms)
"tx" => Trial(485.582 μs)
"rand(1:1000, 1000000)" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"base" => Trial(5.636 ms)
"tx" => Trial(1.401 ms)
``````

With 18 threads, I saw an over 12x improvement with 10 unique items, and a 4x improvement with 1000 unique items.

I’d expect to see bigger improvements the more expensive the comparisons.

1 Like

Oops. That was using the commit I haven’t pushed. It should work now. Thanks for letting me know.

Yeah, I need to improve scalability somehow. The allocator/GC could be a bottleneck for something like this but I haven’t looked at it carefully.

Yes, I’ve already stumbled upon ThreadsX. I should shout-out a big THANKS to @tkf for contributing this package!

I may have one question though: at the moment I’m doing

``````new = ThreadsX.collect((g = produce_new(b,s); hash(g); g) for Iterators.product(old, S))
append!(old, new)
``````

does `ThreadsX.unique` come with the warranty of preserving the order of elements in `old`?
i.e. if I know that `unique(old) == old` can I assume that

``````old_copy = deepcopy(old)
append!(old, new)
@assert old[1:length(old_copy)] == old_copy
``````

?

Thanks for running those;
for me the real problems start at sizes one can not `@benchmark`;
on my i9-9900X:

``````julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i9-9900X CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
``````

I get these timings:

``````# Base.collect; Base.unique
22.669670 seconds (251.44 M allocations: 11.994 GiB, 24.77% gc time)
12.160706 seconds (252.21 M allocations: 12.222 GiB, 32.47% gc time)
17.786907 seconds (421.10 M allocations: 15.169 GiB, 44.75% gc time)
``````

for this test `sizes = [49, 1777, 57725, 1777541]`, so the most costly `unique` is computed on the array of `2743229` elements with `1777541` uniques.

also for bi-threaded `==` I get

``````# Base.collect; Base.unique
30.702327 seconds (263.90 M allocations: 13.323 GiB, 23.79% gc time)
20.626636 seconds (263.81 M allocations: 13.509 GiB, 33.23% gc time)
21.210533 seconds (433.29 M allocations: 16.518 GiB, 42.18% gc time)
``````

so probably my handling of `@spawn` in `==` is still not correct

@tkf if I may have just one small gripe for ThreadsX?
it’s the length of its deps Most of them seem ok, but `DelimitedFiles`, `Dates`, `ZygoteRules` look out of place

``````   Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Updating `~/.julia/dev/Groups/Project.toml`
Updating `~/.julia/dev/Groups/Manifest.toml`
[dce04be8] + ArgCheck v2.1.0
[198e06fe] + BangBang v0.3.29
[34da2185] + Compat v3.19.0
[a33af91c] + CompositionsBase v0.1.0
[187b0558] + ConstructionBase v1.0.0
[9a962f9c] + DataAPI v1.3.0
[e2d170a0] + DataValueInterfaces v1.0.0
[244e2a9f] + DefineSingletons v0.1.0
[22cec73e] + InitialValues v0.2.10
[82899510] + IteratorInterfaceExtensions v1.0.0
[1914dd2f] + MacroTools v0.5.5
[42d2dcc6] + Referenceables v0.1.0
[ae029012] + Requires v1.1.0
[efcf1570] + Setfield v0.7.0
[171d559e] + SplittablesBase v0.1.10
[3783bdb8] + TableTraits v1.0.0
[bd369af6] + Tables v1.1.0
[28d57a85] + Transducers v0.4.52
[700de1a5] + ZygoteRules v0.2.0
[8bb1440f] + DelimitedFiles
[9fa8497b] + Future
[76f85450] + LibGit2
[44cfe95a] + Pkg
[de0858da] + Printf
[3fa0cd96] + REPL
[ea8e919c] + SHA
[1a1011a3] + SharedArrays
[10745b16] + Statistics
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
``````

ThreadsX does not depend on any of those 3 packages you listed. Likely another package in your Project.toml did, or they are secondary dependencies (dependencies of packages that ThreadsX does depend on). Two of the three are stdlib packages though, so not usually a concern with those anyway.

1 Like

It looks like Dates and DelimitedFiles come from Compat. But I wouldn’t worry about them since they are stdlib. ZygoteRules comes from BangBang and it was kind of nice thing to have for supporting differentiable maybe-inplace functions: [ANN] BangBang.jl: a compatible layer for mutable and immutable data structures (bonus: Zygote.jl support). Minimalistic interface packages like ZygoteRules are the foundation of interoperability in the Julia ecosystem and I would rather cheer that it’s in the dependency of any package (though I should be switching to ChainRulesCore).

Anyway, the heaviest dependency of ThreadsX is Transducers and I really should be dissolving Transducers to smaller packages. Until then, I think everything else in the dependency chain is negligible.

Yeah, the result of `ThreadsX.unique` is supposed to be identical as long as the user function does not have data races (and there are no relevant bugs in ThreadsX).

1 Like

thanks!