Suppose a function tests for an initial condition prior to running an inner function on elements of a series of datasets. If the initial condition fails, the function skips to the next dataset. After iterating through all the datasets it repeats the process on all the incomplete datasets.
A crude example might be.
using Threads
g1 = [ 1,2,3]
g2 = [ 4,5,6]
testpop = [g1,g2]
function main(testpop, cont = Ref(true))
untested = [ ]
for group in testpop
condition = second(now())/10
@spawn for people in group
cont [ ] || break
if condition <= rand(1:12)
push!(untested, group)
break
else
fun1(people)
popfirst!(group)
end
end
return(cont)
end
!isempty(untested) && main(untested)
end
2 quick questions.
Given stvenjg’s post about tail recursive algorithms not having much advantages in julia, should a different approach be employed as testpop and gn get larger? Is it helpful/possible to enlarge the base case here?
Assuming recursion is used, would it be helpful to use different data structures like channels instead of arrays?
Sorry for the basic questions. Just want to make sure i’m going about this correctly in julia. Thanks!
Wow thanks so much for this and your other posts on recursion.
Originally I used @async and Ref to break the loop with an external command. However the warning in the docs encouraged the use of @spawn over @async. Am I correct in understanding that in cases involving the parallel mutation of arrays @async is still preferable over @spawn becuase it disables the migration of the parent task across worker threads?
With a while loop it seems like breaking the loop with an external command becomes more periphrastic than using tail end recursion. Something like
function main(testpop, cont = Ref(true))
@async while all(!isempty,testpop) && cont[]
for group in testpop
condition = second(now())/10
@async for people in group
cont [] || break
if condition <= rand(1:12)
break
else
fun1(people)
popfirst!(group)
end
end
return(cont)
end
end
return(cont)
end
No @async is used for asynchronous tasks that aren’t necessarily parallel at all. If you are running everything on a single thread (and just switching back and forth between tasks) then race conditions are less of an issue (since you have more control over when task switches occur), but that’s not helpful if you actually want your tasks running on parallel hardware threads.
Yes, tail recursion pushes a new stack frame while a loop does not. Because self tail recursion is (by definition) trivially rewritable as a loop, there’s basically no reason to use it compared to a loop in Julia.
(As opposed to non-tail recursion, which can be quite handy as I explained in the thread you linked.)
It is strongly encouraged to favor Threads.@spawn over @async always even when no parallelism is required especially in publicly distributed libraries. This is because a use of @async disables the migration of the parent task across worker threads in the current implementation of Julia.
Are arrays the exception to this warning? Should a caveat be that in cases where thread safety is a bigger concern than parallelism i.e. the present use of arrays, @async is actually preferred because it might serve as the “lock” you mentioned above?