I am running quite a big optimization code (I can’t share it). The big picture is that I use a modified version of Tamas Papp’s MultistartOptimization.jl to estimate a non-linear model.
When I increase the amount of Sobol points significantly that the optimization program uses, Julia simply collapses. The code runs seemingly without problems for a while, but then I get:
Julia has exited. Press Enter to start a new session.
Any idea of what might be causing this?
I tried to run the optimization with fewer points, and it works. However, when I run it on the Terminal, not in Atom, I get weird behavior irrespective of the number of points.
I updated Juno as explained here and the problem persists.
My first guess would be O(ut)O(f)M(emory). Could you run Julia REPL in an OS command line shell? Then you should at some point see better indications of the problem. You could also watch your system monitor w.r.t. memory workload during a program run.
I have no experience reading these, but I guess it’s telling me that RAM is almost full, so if I increase the size of the optimisation problem, I’ll run out?
Is there a way around this? I’m also running this same code on a computer with fewer cores, and I do not run into the problem of Julia in Atom crashing:
It looks like it: both systems have 32 GB memory, on one we have 9 GB workload, on the other 24 GB. The difference is quite striking. Did you check the base load of the two systems without running your code? If the base load is similar we could have a problem. From the looks of it you also could be running different Julia versions?
I haven’t checked the base load but I run Ubuntu on both and the same Julia version. These two computers do not run anything else than this code (and the OS).
Could it be that the 24-cores needs a lot of RAM to deal with 24 cores?
Could be. But I don’t fully understand your problem. Take matrix multiply as an example: you can scale that to a high number of cores which all compute on different parts of the matrices without too much memory overhead. Something seems special in that all your threads need so much memory…
Each thread needs to solve a dynamic programming problem (which uses big arrays) and do its own Monte Carlo experiment. That’s the nature of the model.
It’s not the case that all threads are working with shared arrays.