I have a multithreaded code that I am running on a cluster using Julia v1.10, and it allocates many temporary arrays. The computation is quite linear algebra heavy, and I would like to keep freeing memory from time to time. Assuming that I need 10 threads to carry out my calculation, would a good strategy be to request for 10+n CPUs, and start julia with julia -t 10 --gcthreads n,1? So with 15 CPUs, this would be julia -t 10 --gcthreads 5,1? Would having free CPUs for the GC threads help, or is this unnecessary? Also, would the 1 dedicated thread for the concurrent sweep phase make a difference? In that case, would requesting for 16 CPUs be a better idea?
Sorry about the vague question, but I am just looking for general guidelines that I can play around with.
For this you should likely use julia -t 15 --gcthreads 8,1. Julia doesn’t have a concurrent GC, so when the GC is running, the regular threads aren’t running. That said, if you are seeing substantial amounts of time spent in GC, it might be worth reducing the number of temporary arrays needed (e.g. via preallocation). Especially once you start scaling up threads (e.g. running on 30-60 cores), gc can be an issue if not mitigated since garbage collection (pretty much inherently) doesn’t achieve perfect scaling. That said, if your arrays are relatively large, I would expect garbage collection to be pretty quick.