Pmap equivalent for broadcast

Hi all,

I am wrangling with the following problem.

I have a function, call it f, that I want to apply on each element of a vector, and that takes many References(DataFrame, matrices, scalars).

In order to do it without parallelization, map() did not work because it just applies the function to the first element of the vector. Instead, broadcast manages to apply it to all elements of the vector markets.

profit_test = map(retailer_var_profit_loop, markets, Ref(dt2), Ref(nu_alpha), Ref(nu_p), Ref(nu_xj), Ref(nu_xm), Ref(nu_xr),Ref(marketsize),Ref(xi_no_draws))

profit_test = broadcast(retailer_var_profit_loop, markets, Ref(dt2), Ref(nu_alpha), Ref(nu_p), Ref(nu_xj), Ref(nu_xm), Ref(nu_xr),Ref(marketsize),Ref(xi_no_draws))

Then, I wanted to use pmap in order to speed up this calculation. However, since I cannot get map to run, is there an equivalent way to do this with broadcast?

I am trying to use pmap as I have read that it is faster than using @distributed for loops whenever the operation on a single element is time-consuming, which it is in my case.

Thanks!

Or in another way, is it somehow possible to adjust my code for map() so that I can get the outcome of broadcast() in map() as well?

I think it is possible to do this with map (and then to replace with pmap) as follows:

profit_test = map(market -> retailer_var_profit_loop(market, 
  dt2, nu_alpha, nu_p, nu_xj, nu_xm,
  nu_xr,marketsize,xi_no_draws)), markets)

(note exactly how map treats multiple arguments - for example by typing ?map<return>)

1 Like

You were right, that is exactly how it works. Thanks so much!!

I hope you don’t take this the wrong way but a key piece of advice generally is to ensure that whatever function you are trying to parallelize is optimized to the absolute maximum before you look at parallelisation. This is both because there’s often huge gains for serial execution to be had (if you hang around this forum enough you’ll probably find anything from 2x to 20,000x, with 100x and more quite common), and unoptimized functions often scale quite badly when parallelised.

I say this because of the snippet you posted yesterday (?) where e.g. you repeatedly defined a function in a loop, computed the number of matches in a vector relatively inefficiently, were accessing variables not defined in the loop etc. If you can, profile the code you are running to see where most time is spent, then isolate that bit into an MWE and see if you can speed it up by following the Performance tips. If you think you’ve squeezed out all you can but suspect there might be more to be had (e.g. because your code still allocates more than expected), post it here or in #performance-helpdesk on Slack or Zulip and people will be happy to help.

2 Likes

I would tend to say the opposite — it’s typically easier to parallelize slow, inefficient code than fast code, and it’s easier to parallelize simple algorithms than clever ones … but the parallel slow code is often still worse than the serial fast code, so it’s still better to start by optimizing serial performance.

2 Likes

Yes, maybe I was too unequivocal - I was mainly thinking of the typical correspondence between unoptimized code and allocations, and allocations limiting the benefits from multithreading.

2 Likes

@nilshg @stevengj Thanks both for the comments! Yes, I am parallely working on optimizing the code in my project that is important for performance - the stuff I posted yesterday is something done only a few times in the project, where other stuff will be calculated millions of times. So I am trying to get the latter fast first.