I’m currently working on a simulation server for running simulations with our simulation engine. The simulation input is uploaded via a Python flask API, but the simulation code and the code handling execution is in Julia. Because this is a simulation server, it is supposed to make use of all the compute resources it is given, but the simulation itself is necessarily single-threaded, so the parallel computing comes down to being able to run multiple simulations at the same time.
The current architecture works like that: The API handles requests for uploads, starting simulations, fetching results, etc. and marks a run to be simulated. A seperate Julia program runs in an endless loop, scanning directories for runs that have been marked as to be executed, then spawns a thread to handle that and continues the loop.
I got this almost working with multithreading, but I ran into issues with CWD-relative file paths interfering between threads, in particular the main thread that is running the scanning loop. Turns out cd(function, dir, args) changes the CWD for all threads. I’m sure there is a technical reason for that.
My options are now 1.) refactor the simulation program to not use CWD and always use the run directory as the base path or 2.) use multiprocessing instead, which I briefly tried but that brings its own bundle of issues to be solved or 3.) somehow find a solution to keep CWD separate for each thread.
I’m hoping for 3.) because both 1.) and 2.) are unknown but probably significant time investments. Any ideas?