Assigning values in threaded nested for-loop

I have some code which looks like this:

Threads.@threads for branch in tree
    Threads.@threads for twig in branch 
        append!(biglist, dostuff.(twig))
   end
end

where tree is a vector (tree) of vectors (branch) of vectors (twigs) of floats (leaves).

Obviously this isn’t thread safe, so I tried to make it threadsafe like this:

function findlength(tree)
    for branch in tree
        for twig in branch 
            biglistlength += length(twig)
        end
    end
    return biglistlength
end
biglist = zeros(findlength(tree))
j = 0
Threads.@threads for branch in tree
    Threads.@threads for twig in branch 
        for (i,leaf) in enumerate(twig)
            biglist[i+j] = dostuff(leaf)
        end
        j += length(twig)
    end
end

But this isn’t thread safe either because there is a race condition in the j += step.

What’s the correct way of going about something like this? I looked at the threading documentation and the atomic stuff, the locking stuff, as well as the floop stuff all look to be very complicated and would almost work here. The atomic and locking seem to devolve the two loops into a serial operation which defeats the purpose of making it threaded, while the floop reduction seems to almost work except I still need to figure out the indices in my list somehow.

Edit: I know someone will tell me that nested vectors isn’t the way to go, but my real code doesn’t actually look like this, and this kind of construction is definitely what I need.

Just to confirm I understood correctly: you want to flatten the tree into an array? Do you care about the order?

Actually no, I have edited my original post to clarify what I meant. There is a function (dostuff which acts on the innermost element and I want to put the results in a list.

If you don’t insist on for loops, Transducers.jl provide an easy way:

biglist = tcollect(MapCat(branch -> tcollect(MapCat(twig -> dostuff.(twig)), branch)), tree)

You can easily test different ways to parallize, i.e., none using collect, threaded using tcollector distributed using dcollect.

Does this work if dostuff returns multiple outputs?

Sorry, had somehow missed the append! or read as push!for that matter. To collect all results into a single output, just add another MapCat instead of broadcasting, i.e., twig -> collect(MapCat(dostuff), twig).

Also found that there are some updates to transducers, which allow to get rid of the inner collect calls, i.e., to avoid allocating storage here (just like your code):

tree |> MapCat(branch -> branch |> MapCat(twig -> twig |> MapCat(dostuff))) |> tcollect
tree |> MapCat(branch -> branch |> MapCat(twig -> twig |> Map(dostuff))) |> foldxt(append!!)

The last version using foldxt – like reduce but multi-threading in a tree like fashion – and append!!ing all results – using fast append from BangBang – is even more similar to your code.