I have a relatively simple script that used to run fine on 1.8.X version: it executes lots of relatively small jobs via pmap.
Executing on julia 1.8.5 with -p 32 , creates 32 processes , each using approximately 2Gb and after a few hours the calculation successfully finishes. However on 1.9.0 and 1.9.1 after a few minutes each julia processes starts to allocate lots of memory and once it reaches about 10Gb per task, all of them gets killed by oom-killer.
Reducing number of tasks, by making each one of them process more data via internal loop does not seem to help.
We have encountered similar issues when during CI testing for Trixi.jl. What helped us in the parallel case was to add a --heap-size-hint=1G to each invocation of Julia, such that the garbage collector is more aggressive:
This actually helped to resolve some issues we had with parallel runs in a CI job before with v1.8, thus maybe it will help you as well.
I tried running my script with --heap-size-hint=2G, but it didn’t change anything. Perhaps, this flag doesn’t propagate to julia processes that are started with -p flag?