For the record, I still have not found the reason why my server crushed solving some large models using JuMP+Gurobi. But I have experienced like dozens of times.
Here is the situation. Each time I constructed a JuMP model with about 200B variables with like three instances (each takes about 250 GB using direct mode
, the total memory consumption is within the machine limit 1.5TB), the machine will crush and no traces are left in the logs from the system and gurobi. It’s just like a random thing that I could not control at all, and I don’t know when it will happen, how it happened, and how to fix this. I had to reboot the machine since no ssh and cable connection is available when this happens.
It’s killing me! Is there anybody who has encountered the same issue with me?
I encountered similar issues a while ago but it was not related to either Gurobi/JuMP and was likely a Julia issue (I was able to build and solve the model effectively once I migrated to another language).
I’m guessing this is during the model-building phase. Have you tested other solver interfaces?
Gurobi.jl wraps the 32-bit Gurobi API, so you cannot have more than 2_147_483_647
variables.
Normally the model building is smooth, the machine crushed during the solving process.
@odow Is there any future plan that Gurobi.jl will extend this limit? Since gurobi didn’t limit the variables and supports 64-bit. Another question is does this mean the underlying float precision is restricted to 32-bit or something else?
There are no plans.
It doesnt make sense to try to solve a problem with that many variables. Even if you could build it, Gurobi is unlikely to solve it.
What is your application? How can you interpret a problem with 10^11 decision variables.
@odow I constructed a problem with 8760 time slots and over 1000 locations. The high spatiotemporal resolution results in a very large variable space which is necessary for me to find the temporal dynamics and precise spatial location.
Now since gurobi supports 2147483647 variables at most, I think I do need to reduce the resolution.
So you have 8.76e6 time/location slots. But that doesn’t explain how you the have an additional 1e5 variables for each of the time/location pairs?
1e11 is now in hypothesis. The most common situation is about 1.9e8 variables.
To clarify the future plans: there are GRBX
routines which allow more than 2e9 non-zeros in the constraint matrix: GRBXloadmodel - Gurobi Optimization. But they still don’t support more than typemax(Cint)
variables or constraints.
In that case you’re likely hitting the limit of the number of non-zeros in the constraint matrix.
Thanks for the replying. This is of great help. Now I finally know where the boundary is.