Pmap with complex structs

squaregoldfish · December 4, 2018, 7:24pm

I have a large array of a complex mutable struct type. The whole array is processed iteratively, but each element can be processed independently of the others in a given iteration. The number of iterations required to complete varies, so I can’t loop through it a predictable number of times. The ideal is therefore for the struct to have a finished field - then I could call pmap on the whole array repeatedly until every element has finished = true.

I can’t do this because my struct isn’t a bits type, so pmap can’t deal with it. My backup plan is to have a separate Status array of Bools the same size as the data array and use that as the basis of the pmap calls. So I can pass the correct element of the data array into my function, and return a Bool to update the corresponding element in the Status array. Something like this hypothetical code:

function process(dataEntry::MyStruct, dataStatus::Bool)::Bool
  finished = dataStatus

  if !finished  
    # Process the dataEntry
    # Set finished flag
  end
  
  return finished
end
 

data::Array{MyStruct}
status::SharedArray{Bool}

while sum(status .= false) > 0
  status = pmap((x, y) -> process(x, y), data, status)
end

Is that a reasonable approach, or is there a better way?

(I hope the question is clear - if not, shout at me and I’ll try to explain further…)

jpsamaroo · December 4, 2018, 8:33pm

First thing: how do you generate/acquire your structs? Would it be possible to generate sets of them on each node/process that’ll be processing them? If so, a better approach might be the following:

A driver program on the master node launches the worker processes, and splits up the generation/acquisition of your structs into chunks of a reasonable size.
Each worker receives at least 1 chunk, and processes the entire chunk.
The worker then returns the results to the master node, or writes it to a file, or what have you.
The worker is then given another chunk to work on, if any are left.

The benefits of this approach are numerous:

It’s simple from the master’s perspective, only a simple for loop and some sort of remotecall_* or remote_do is necessary.
You don’t need to transfer an entire Status vector across the network, only the instructions for which structs to process (which could easily be a UnitRange or similarly tiny object).
You allow the potentially expensive generation/acquisition of your structs to be done in parallel.

Even if your structs must be generated/acquired on a single node, you could just send them to each worker manually and call that your “chunk”.

EDIT: I didn’t notice the sum reduction step when I first wrote this. I’d recommend just doing that to the processed structs on each node, and just return that value to the master for a final reduction, which should be a ton quicker than transferring the structs back to the master to be reduced.

bennedich · December 4, 2018, 9:10pm

Perhaps I’m missing something, but I don’t see how this would work with an iterative method like in the OP?

jpsamaroo · December 4, 2018, 9:38pm

I had assumed from the OP that the iterative method was being applied to the individual structs independently, but if that’s not the case, then I think I’d need a specific example of what @squaregoldfish is actually doing with these structs.

affans · December 5, 2018, 3:50am

Why can’t pmap handle your struct. I am fairly certain I’ve used it with my rather complicated struct however, mine still may have been isbits

jpsamaroo · December 5, 2018, 4:34pm

Indeed it does, just tested with a simple non-isbitstype struct and it worked perfectly fine. A simple pmap worked perfectly.

squaregoldfish · December 5, 2018, 5:53pm

I’ll have to check my Julia version is up to date and test it again. I might still be on 1.0, but I can’t check right now.

There’s a lot of good food for thought amongst the your suggestions too. I should be able to get something better working with a bit of experimentation. I’ll report back…

squaregoldfish · December 6, 2018, 5:59pm

So it was SharedArray that couldn’t handle the struct, not pmap. Apologies for misleading.

Having thought about it I can indeed generate my structs on the fly on the processing nodes, so there’s no need for that array. So I just need to return true/false from each call and reduce them to see if all the structs are complete or not.

Topic		Replies	Views
Pmap results in Array of type Any General Usage question	6	1012	January 14, 2021
Pmap() data inputs New to Julia	4	317	March 20, 2023
Function pmap multi-argument General Usage distributed	10	963	January 8, 2022
Weird UndefRefError in pmap since 1.8 General Usage question , distributed , pmap	5	356	November 25, 2023
Pmap slow compared to map General Usage performance , parallel	11	3045	September 25, 2018

Pmap with complex structs

Related topics