Best data structure for recursive functions?

phantom · March 29, 2024, 1:12pm

Hi Hope everyone is having a good friday.

Suppose a function tests for an initial condition prior to running an inner function on elements of a series of datasets. If the initial condition fails, the function skips to the next dataset. After iterating through all the datasets it repeats the process on all the incomplete datasets.

A crude example might be.

using Threads 

g1 = [ 1,2,3]
g2 = [ 4,5,6]
testpop = [g1,g2] 

function main(testpop, cont = Ref(true))
  untested = [ ]
  for group in testpop
        condition = second(now())/10    
        @spawn for people in group
              cont [ ] || break
              if condition <= rand(1:12) 
                 push!(untested, group) 
                 break 
              else   
                 fun1(people)
                 popfirst!(group)
              end
        end
        return(cont)
   end 
   !isempty(untested) && main(untested)  
end

2 quick questions.

Given stvenjg’s post about tail recursive algorithms not having much advantages in julia, should a different approach be employed as testpop and gn get larger? Is it helpful/possible to enlarge the base case here?
Assuming recursion is used, would it be helpful to use different data structures like channels instead of arrays?

Sorry for the basic questions. Just want to make sure i’m going about this correctly in julia. Thanks!

stevengj · March 29, 2024, 2:14pm

Your tail recursion here can be trivially replaced by a while loop (like all self tail recursion). Do that.

You are also using untyped containers (the Any array []).

Parallel mutation of an array like this isn’t thread safe. You need a lock or a different data structure to avoid a race condition.

phantom · March 29, 2024, 10:15pm

Wow thanks so much for this and your other posts on recursion.

Originally I used @async and Ref to break the loop with an external command. However the warning in the docs encouraged the use of @spawn over @async. Am I correct in understanding that in cases involving the parallel mutation of arrays @async is still preferable over @spawn becuase it disables the migration of the parent task across worker threads?

With a while loop it seems like breaking the loop with an external command becomes more periphrastic than using tail end recursion. Something like

function main(testpop, cont = Ref(true))
    @async while all(!isempty,testpop) && cont[] 
        for group in testpop
            condition = second(now())/10    
            @async for people in group
                cont [] || break      
                if condition <= rand(1:12)  
                    break 
                else   
                    fun1(people)
                    popfirst!(group)
                end
            end
            return(cont)
        end 
    end
    return(cont) 
end

When you mention in this post that

Did you mean that there are meaningful disadvantages to tail end recursion in julia? Or is it just bad practice in general?

stevengj · March 29, 2024, 10:26pm

No @async is used for asynchronous tasks that aren’t necessarily parallel at all. If you are running everything on a single thread (and just switching back and forth between tasks) then race conditions are less of an issue (since you have more control over when task switches occur), but that’s not helpful if you actually want your tasks running on parallel hardware threads.

Yes, tail recursion pushes a new stack frame while a loop does not. Because self tail recursion is (by definition) trivially rewritable as a loop, there’s basically no reason to use it compared to a loop in Julia.

(As opposed to non-tail recursion, which can be quite handy as I explained in the thread you linked.)

Dan · March 29, 2024, 11:01pm

It’s better not to use continue as a parameter name since it is a Julia keyword used as part of the for loop construct:

for .... in ...
    break
    continue
end

(use ?continue in the REPL for details)

phantom · March 29, 2024, 11:20pm

Thanks so much, that clarifies things a lot. I’m still confused about this note about thread safety where you mention

Presumably a data structure like Channels would alleviate that problem? Are there other data structures that could be appropriate here?

Also you mention

However the warning in the docs reads

It is strongly encouraged to favor Threads.@spawn over @async always even when no parallelism is required especially in publicly distributed libraries. This is because a use of @async disables the migration of the parent task across worker threads in the current implementation of Julia.

Are arrays the exception to this warning? Should a caveat be that in cases where thread safety is a bigger concern than parallelism i.e. the present use of arrays, @async is actually preferred because it might serve as the “lock” you mentioned above?

phantom · March 29, 2024, 11:26pm

Thanks so much for pointing that out! I edited it to cont so it doesn’t mislead me or other students in the future…or distort some LLM.

Topic		Replies	Views
Help with Threading New to Julia question	10	1551	July 4, 2020
The right mental model of `@threads`? General Usage parallel	6	1302	June 3, 2020
@async macro not recommended? General Usage	5	450	April 3, 2024
Iterating through an inner loop with @async New to Julia question , async	8	616	June 30, 2023
Recursive multithreading General Usage	2	640	September 3, 2018

Best data structure for recursive functions?

Related topics