AFAIK, this is still the best thread on this use case. For me, this is a pretty common scenario when parallel programming and I found this thread enlightening, but hard to parse. I’ve now returned to it with fresh eyes, to see if I can distill the essence.
I’m going to use @StevenSiew 's metaphor of treasure hunting to describe the general situation of performing a lengthy computation on each element of a domain (ie. set of input values), where you are only interested in finding any input value that produces a desirable output from the computation. In those cases it makes sense to put multiple cores to work, and then move on as soon as one thread finds a successful input.
Let’s use a toy example, and start with obvious multi-threaded version:
function is_treasure(val)
sleep(0.1)
return val % 23 == 0
end
function find_treasure(haystack)
Threads.@threads for i in haystack
if is_treasure(i)
println("Found treasure: " * string(i))
end
end
end
@time find_treasure(1:100)
As you would expect, treasure is found out of order, and the entire domain is searched:
% julia --threads=auto treasure_hunt.jl
Found treasure: 92
Found treasure: 69
Found treasure: 46
Found treasure: 23
1.495740 seconds (18.51 k allocations: 972.306 KiB, 0.46% compilation time)
The first suggestion in this thread to introduce early termination is with a synchronised flag for each thread to check. Atomic
will do:
function find_treasure_atomic(haystack)
found = Threads.Atomic{Bool}(false)
Threads.@threads for i in haystack
found[] && return
if is_treasure(i)
found[] = true
println("Found treasure: " * string(i))
return
end
end
end
@time find_treasure_atomic(1:100)
Typical results:
% julia --threads=auto treasure_hunt.jl
Found treasure: 92
Found treasure: 69
0.627929 seconds (24.75 k allocations: 1.285 MiB, 3.23% compilation time)
So it helps, but has issues (courtesy of @tkf ):
- It relies on an undefined behavior of
@threads
; i.e., return
terminates the basecase iteration.
- It introduces an atomic read
stop[]
. It may suppress some compiler optimization (even in x86). But I’m not super sure about this.
- It also doesn’t guarantee that threads will see the flag if they’re already busy doing work, as evidenced by the two “Found treasure” lines that sometimes appears in the output.
The second suggestion in this thread is to use the early termination functionality built into Transducers.jl
. Bringing that advice up to date with current versions:
using Transducers
function find_treasure_transducer(haystack)
foldxt(Map(identity), haystack, init = nothing) do _, i
if is_treasure(i)
println("Found treasure: " * string(i))
return reduced(i)
end
return nothing
end
end
@time find_treasure_transducer(1:100)
Which for me produces the perplexing output:
Found treasure: 69
Found treasure: 46
Found treasure: 92
Found treasure: 23
Found treasure: 23
Found treasure: 23
1.408017 seconds (244.96 k allocations: 13.486 MiB, 4.33% compilation time)
Some of this result can probably be explained by the default chunk size basically ensuring that the domain is split evenly between all available cores. Once a thread finishes, there is no more input to trigger another thread to start, so there’s nothing early about early termination. I tried playing with basesize
but didn’t make any progress.
Finally, the last suggestion in this thread is to DYI on top of @spawn
instead of @threads for
. I posit that if this is the best path, that an opportunity exists to close that gap.
So I guess in conclusion, I’m still without an easy and safe solution on par with the ease of @threads for
, for what I consider a common use case. But for most of my purposes, I’ll take my risks with the atomic flag.