Use of CachingPool's for a pmap that will be run many times

Hi! I’m writing a custom linear map using LinearMaps.jl whose matrix-vector product I would like to embarassingly parallelize. The point is that this linear map will be used tens of thousands of times during an iterative solver, so I would like to speed-up each matrix-vector product.

I’ve read that CachingPool can be very useful for speeding up pmap, as using the default WorkerPool can lead to a lot of overhead. See, for example, this thread: https://github.com/JuliaLang/julia/pull/33892. However, my question is: since the pmap used in my matrix-vector product will itself be run tens of thousands of times, does this mean it would be highly beneficial to use the same CachingPool across all of these pmaps, rather than creating a new CachingPool each time I run pmap? I could maybe do this by including a CachingPool as part of the state of my linear operator. I’m a bit cautious of this because I don’t know if it will lead to any performance improvement (I don’t really understand Julia distributed), and it also makes the code a decent amount more complicated as I have to manage the state of a pool. What are your thoughts?