FYI ThreadsX.jl has ThreadsX.unique. It should work with many iterator transforms including Iterator.flatten.
Here is a comparison of Base.unique ("base") and ThreadsX.unique ("tx") ran on a two-core machine (some random machine on GitHub Actions):
ID time GC time memory allocations … … … … … ["unique", "rand(1:10, 1000000)", "base"]9.666 ms (5%) 832 bytes (1%) 8 ["unique", "rand(1:10, 1000000)", "tx"]5.080 ms (5%) 50.98 KiB (1%) 882 ["unique", "rand(1:1000, 1000000)", "base"]8.653 ms (5%) 65.95 KiB (1%) 27 ["unique", "rand(1:1000, 1000000)", "tx"]5.377 ms (5%) 1.07 MiB (1%) 1186
Here is the benchmark script: https://github.com/tkf/ThreadsX.jl/blob/7a98c2407a45a02d818c85729570e47c3dbccd8f/benchmark/bench_unique.jl