Large scale model using Gurobi causes machine crushed

For the record, I still have not found the reason why my server crushed solving some large models using JuMP+Gurobi. But I have experienced like dozens of times.
Here is the situation. Each time I constructed a JuMP model with about 200B variables with like three instances (each takes about 250 GB using direct mode, the total memory consumption is within the machine limit 1.5TB), the machine will crush and no traces are left in the logs from the system and gurobi. It’s just like a random thing that I could not control at all, and I don’t know when it will happen, how it happened, and how to fix this. I had to reboot the machine since no ssh and cable connection is available when this happens.
It’s killing me! Is there anybody who has encountered the same issue with me?

I encountered similar issues a while ago but it was not related to either Gurobi/JuMP and was likely a Julia issue (I was able to build and solve the model effectively once I migrated to another language).
I’m guessing this is during the model-building phase. Have you tested other solver interfaces?

1 Like

Gurobi.jl wraps the 32-bit Gurobi API, so you cannot have more than 2_147_483_647 variables.

1 Like

Normally the model building is smooth, the machine crushed during the solving process.

@odow Is there any future plan that Gurobi.jl will extend this limit? Since gurobi didn’t limit the variables and supports 64-bit. Another question is does this mean the underlying float precision is restricted to 32-bit or something else?

There are no plans.

It doesnt make sense to try to solve a problem with that many variables. Even if you could build it, Gurobi is unlikely to solve it.

What is your application? How can you interpret a problem with 10^11 decision variables.

@odow I constructed a problem with 8760 time slots and over 1000 locations. The high spatiotemporal resolution results in a very large variable space which is necessary for me to find the temporal dynamics and precise spatial location.
Now since gurobi supports 2147483647 variables at most, I think I do need to reduce the resolution.

So you have 8.76e6 time/location slots. But that doesn’t explain how you the have an additional 1e5 variables for each of the time/location pairs?

1e11 is now in hypothesis. The most common situation is about 1.9e8 variables.

To clarify the future plans: there are GRBX routines which allow more than 2e9 non-zeros in the constraint matrix: GRBXloadmodel - Gurobi Optimization. But they still don’t support more than typemax(Cint) variables or constraints.

In that case you’re likely hitting the limit of the number of non-zeros in the constraint matrix.

Thanks for the replying. This is of great help. Now I finally know where the boundary is.

1 Like