I have a large array of a complex mutable struct type. The whole array is processed iteratively, but each element can be processed independently of the others in a given iteration. The number of iterations required to complete varies, so I can’t loop through it a predictable number of times. The ideal is therefore for the struct to have a finished field - then I could call pmap on the whole array repeatedly until every element has finished = true.
I can’t do this because my struct isn’t a bits type, so pmap can’t deal with it. My backup plan is to have a separate Status array of Bools the same size as the data array and use that as the basis of the pmap calls. So I can pass the correct element of the data array into my function, and return a Bool to update the corresponding element in the Status array. Something like this hypothetical code:
function process(dataEntry::MyStruct, dataStatus::Bool)::Bool
finished = dataStatus
if !finished
# Process the dataEntry
# Set finished flag
end
return finished
end
data::Array{MyStruct}
status::SharedArray{Bool}
while sum(status .= false) > 0
status = pmap((x, y) -> process(x, y), data, status)
end
Is that a reasonable approach, or is there a better way?
(I hope the question is clear - if not, shout at me and I’ll try to explain further…)
First thing: how do you generate/acquire your structs? Would it be possible to generate sets of them on each node/process that’ll be processing them? If so, a better approach might be the following:
A driver program on the master node launches the worker processes, and splits up the generation/acquisition of your structs into chunks of a reasonable size.
Each worker receives at least 1 chunk, and processes the entire chunk.
The worker then returns the results to the master node, or writes it to a file, or what have you.
The worker is then given another chunk to work on, if any are left.
The benefits of this approach are numerous:
It’s simple from the master’s perspective, only a simple for loop and some sort of remotecall_* or remote_do is necessary.
You don’t need to transfer an entire Status vector across the network, only the instructions for which structs to process (which could easily be a UnitRange or similarly tiny object).
You allow the potentially expensive generation/acquisition of your structs to be done in parallel.
Even if your structs must be generated/acquired on a single node, you could just send them to each worker manually and call that your “chunk”.
EDIT: I didn’t notice the sum reduction step when I first wrote this. I’d recommend just doing that to the processed structs on each node, and just return that value to the master for a final reduction, which should be a ton quicker than transferring the structs back to the master to be reduced.
I had assumed from the OP that the iterative method was being applied to the individual structs independently, but if that’s not the case, then I think I’d need a specific example of what @squaregoldfish is actually doing with these structs.
I’ll have to check my Julia version is up to date and test it again. I might still be on 1.0, but I can’t check right now.
There’s a lot of good food for thought amongst the your suggestions too. I should be able to get something better working with a bit of experimentation. I’ll report back…
So it was SharedArray that couldn’t handle the struct, not pmap. Apologies for misleading.
Having thought about it I can indeed generate my structs on the fly on the processing nodes, so there’s no need for that array. So I just need to return true/false from each call and reduce them to see if all the structs are complete or not.