I’m using the Distributed package to parallelize a for loop during a Monte Carlo simulation, this for loop is running over several initial conditions to generate a CSV file of L/2 x N entries where L is the size of the system I’m considering and N is the number of different initial conditions.
for L = 64 and N = 40 the code works perfectly well, however, as soon as I increase the L (for example to L = 128) it throws an error for each worker “Worker # terminated. Unhandled Task ERROR: EOFError: read end of file” or "Worker # terminated. Unhandled Task ERROR: IOError: read: connection reset by peer (ECONNRESET) " that for every worker until a final error “LoadError: TaskFailedException nested task error: ProcessExitedException(2)” is displayed.
I really have no idea what could be going wrong, it seems to be a memory overflow but in principle the clusters in which I’m working on should handle the quantities I’m inserting (on average they have 32 Gb of RAM and 2200-4500 cpu GHz)