Hi all,
I am trying to parallelize an operation over an index of a dataframe.
list_of_dts = [begin
date = first(unique(dt[dt.market_ids .== id, :quarter]))
dma = first(unique(dt[dt.market_ids .== id, :dma_code]))
(dt = dt[dt.market_ids .== id, :], demo_dt = demo_dt[(demo_dt.quarter .== date) .& (demo_dt.dma_code .== dma), :])
end for id in unique(dt.market_ids)]
delta_nfxp = pmap(index -> nfxp_market(index, list_of_dts, nu_demo, nu_cols, random_coeff, demo_coeff, demo_var), collect(1:length(unique(dt.market_ids))))
delta_nfxp = vcat(delta_nfxp...)
In nfxp_market, it subsets list_of_dts by the index
dt_mk = list_of_dts[index][:dt]
demo_dt_mk = list_of_dts[index][:demo_dt]
I think the idea is clear. Then, when I actually run this, my workers keep getting killed. I believe this is an out of memory error. The workers are not all on the same node. For simplicity, let’s say I have 2 nodes with each 16 cores (i.e. 32-1 workers).
To reduce the memory, it could make sense to not export the entire dataset to each worker and then subset. Instead, for example in R, I can easily export just the subset meant for that worker. Is something like this possible in Julia?