By f of f(S) (appeared in \bigcup_{n\ge0} f^n(S)), did you mean a “derived” mapping

\begin{aligned}
f'
&: 2^T \to 2^T \\
&: S \mapsto \bigcup_{t \in S} f(t)
\end{aligned}

? Just making sure, since I often see that f of f(S) in math to mean the “simply lifted” version S \mapsto \{ f(t) | t \in S \}.

If this is correct, I suppose I can rephrase the problem is to find a (non-trivial) fixed point \hat S \subset T of \hat f where

\begin{aligned}
\hat f
&: 2^T \to 2^T \\
&: S \mapsto S \cup f'(S)
\end{aligned}

?

If I understand the problem statement correctly, a simple solution may be:

```
function fixedpoint(
f!,
inits,
T = eltype(inits);
ntasks = max(1, Threads.nthreads() - 1),
buffersize = max(length(inits), 2^20),
)
inputs = Channel{T}(buffersize)
outputs = Channel{Vector{T}}(buffersize)
fp = Set{T}()
ninputs = 0
for x in inits
put!(inputs, x)
ninputs += 1
end
@sync try
for _ in 1:ntasks
Threads.@spawn try
for x in inputs
# Note: this can be further optimized by pooling allocated
# `ys` inside the centralized `while ninputs > 0` loop.
ys = eltype(outputs)(undef, 0)
f!(ys, x)
push!(outputs, ys)
end
finally
close(outputs)
close(inputs)
end
end
while ninputs > 0
ys = take!(outputs)
ninputs -= 1
for y in ys
if !(y in fp)
ninputs += 1
if ninputs > buffersize
error("pending inputs exceeds buffersize")
end
put!(inputs, y)
push!(fp, y)
end
end
end
finally
close(outputs)
close(inputs)
end
return fp
end
# MWE
function f!(ys, x)
d = √(1 - x)
y1 = round((1 + d) / 3; digits = 10)
y2 = round((1 - d) / 3; digits = 10)
push!(ys, y1)
push!(ys, y2)
end
fp = fixedpoint(f!, [0.25])
```

The point is to handle the duplication detection in one place so that it is easy to track pending inputs. It also has an extra benefit to keep the set locally in one CPU. Of course, since all the workers are banging two shared channels, you’d need expensive enough `f!`

for this to be useful.

An interesting optimization may be to use a persistent data structure tracking the fixed point, share it with the workers, and then filter out the elements on the worker side. This is a valid optimization due to the monotonicity of \hat f (i.e., it’s OK to prune elements using the “past fixed point candidate” that under-estimates the latest view).

I think it’s still possible to use the feedback skeleton for this but you’d need a concurrent set which is hard to implement and hard to make it efficient. A problem-specific solution like above may not be so bad.

Is this always correct to do this, for arbitrary “stack” usage? Aren’t your test depending on that `11 % 3 != 0`

?