Thank you for the suggestions!
Yes I started to play with these settings to see how it will impact solve time, which is a high priority for our app given that we have users submitting 1,000’s of jobs/day. However, I ran into unrecognized control parameter RESOURCESTRATEGY · Issue #130 · jump-dev/Xpress.jl · GitHub
I have also looked into using kubernetes to restart the Julia pods based off of some metric. Unfortunately I have come up empty handed so far (we are not iterating in the app so there is no distinct counter of problems solved). The only idea that would work is using a cron job to restart the pods every day. However, this is just a band-aid and I would rather figure out the true problem and fix it.
I have not found a reason to contact FICO yet as I have not identified any issue with Xpress. The problem, as you noted, appears to be in Julia.
I will keep pursuing this avenue in parallel.
Just to note: the app is not iterating over problems and is running 12-20 Julia instances. We have no control over how complex any given problem is nor how many problems are run in any given time interval (except for limiting user POST’s). We would like to process as many jobs in parallel as possible to give users the best experience possible. Unfortunately, all solutions so far (using 1
THREAD or limiting the solver’s memory use) will lead to slower jobs, which negates 1,000’s of hours of work spent making the app faster.