Is JuMP.@variable slow when the number gets large?

I have a Unit commitment (24 hours, 5000 scenarios, IEEE118) problem, that has these two lines

@time JuMP.@variable(m, pA[i=_2(t,T,1), _0(i,t,S), agi])
@time JuMP.@variable(m, pf[i=_2(t,T,0), _0(i,t,S), 1:size(F,1)])

where

_3(i, t, s) = (s-1)*(i>t)+1 # return `s` or `1`
_0(i, t, S) = 1:_3(i, t, S) # S in this case is 5000
_2(t, T, p) = t-p:t+T

and F is the PTDF matrix, agi is a Vector{Int}.
The length of pA is approximately 24 * 5000 * 13 == 1560000,
whereas the length of pf is approximately 24 * 5000 * 186 == 22320000.

And their runtime performance is

adding pA variables...
  8.563364 seconds (36.34 M allocations: 1.792 GiB, 29.52% gc time, 2.12% compilation time)
adding pf variable...
146.426164 seconds (535.10 M allocations: 30.329 GiB, 39.43% gc time, 0.11% compilation time)

Note that the code are inside function bodies (and actually inside my custom package) so they had been compiled.

Do you think adding 1.5 million variables using 8.5 seconds is reasonable?

And also, 8.56 / 13 * 186 = 122.47 < 146.426 seconds.

My model is a Gurobi’s direct model.

I think building large models (in my case, large num of scenarios) is really a bottleneck (Gurobi can almost solve that model in 1 hour, but the modeling code runs for nearly 2 hours). I wonder what is a good solution?

julia> JuMP.optimize!(m)
Gurobi Optimizer version 13.0.1 build v13.0.1rc0 (linux64gpu - "Ubuntu 24.04.4 LTS")

CPU model: AMD EPYC 7763 64-Core Processor, instruction set [SSE2|AVX|AVX2]
Thread count: 128 physical cores, 256 logical processors, using up to 32 threads

GPU model: NVIDIA RTX A6000, CUDA compute version 8.6, NVIDIA driver compatible with CUDA version 13

Non-default parameters:
TimeLimit  3600
Method  6
PDHGGPU  1
Threads  32

Optimize a model with 65080188 rows, 63480926 columns and 1130514176 nonzeros (Min)
Model fingerprint: 0x608a9aa9
Model has 0 linear objective coefficients
Variable types: 54510250 continuous, 8970676 integer (0 binary)
Coefficient statistics:
  Matrix range     [8e-06, 2e+01]
  Objective range  [0e+00, 0e+00]
  Bounds range     [2e-04, 6e+00]
  RHS range        [2e-07, 5e+01]

Presolve removed 0 rows and 0 columns (presolve time = 50s)...
Presolve removed 230001 rows and 231033 columns (presolve time = 73s)...
Presolve removed 42424933 rows and 42156055 columns (presolve time = 2295s)...
Presolve time: 2294.90s
Presolved: 22655255 rows, 21324871 columns, 176926424 nonzeros
Variable types: 12354819 continuous, 8970052 integer (8970052 binary)

Start PDHG on GPU

                       Objective                Residual
     Iter       Primal          Dual         Primal    Dual     Compl    Time
        0   0.00000000e+00  0.00000000e+00  5.25e+01 0.00e+00  0.00e+00 2962s
      202   0.00000000e+00  0.00000000e+00  5.25e+01 0.00e+00  0.00e+00 2967s
     1651   0.00000000e+00  4.58872503e-07  4.34e-09 1.66e-06  3.55e-12 3006s
     1751   0.00000000e+00 -7.24419036e-09  1.49e-09 2.59e-08  1.83e-18 3008s

PDHG solved model in 1751 iterations and 3008.98 seconds (5455.37 work units)
Optimal objective 0.00000000e+00


Root crossover log...

  166145 DPushes remaining with DInf 0.0000000e+00              3076s

 13843535 PPushes remaining with PInf 2.0464937e-05              3088s
 10311631 PPushes remaining with PInf 2.0047059e-04              3111s
 3091561 PPushes remaining with PInf 9.9011976e-05              3591s
Crossover changed status from Optimal to Time Limit

Root relaxation: time limit, 10783309 iterations, 871.47 seconds (2725.82 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0          -    0               -    0.00000      -     - 3611s

Explored 1 nodes (10783309 simplex iterations) in 3613.65 seconds (5600.42 work units)
Thread count was 32 (of 256 available processors)

Solution count 0

Time limit reached
Best objective -, best bound 0.000000000000e+00, gap -

User-callback calls 413829, time in user-callback 0.32 sec

julia> JuMP.solution_summary(m)
solution_summary(; result = 1, verbose = false)
├ solver_name          : Gurobi
├ Termination
│ ├ termination_status : TIME_LIMIT
│ ├ result_count       : 0
│ ├ raw_status         : Optimization terminated because the time expended exceeded the value specified in the TimeLimit parameter.
│ └ objective_bound    : 0.00000e+00
├ Solution (result = 1)
│ ├ primal_status        : NO_SOLUTION
│ ├ dual_status          : NO_SOLUTION
│ └ relative_gap         : 1.00000e+100
└ Work counters
  ├ solve_time (sec)   : 3.61365e+03
  ├ simplex_iterations : 10783309
  ├ barrier_iterations : 0
  └ node_count         : 1

julia> m
A JuMP Model
├ mode: DIRECT
├ solver: Gurobi
├ objective_sense: MIN_SENSE
│ └ objective_function_type: JuMP.AffExpr
├ num_variables: 63480926
├ num_constraints: 103492279
│ ├ JuMP.VariableRef in MOI.GreaterThan{Float64}: 17710727
│ ├ JuMP.AffExpr in MOI.EqualTo{Float64}: 49250187
│ ├ JuMP.VariableRef in MOI.LessThan{Float64}: 11730688
│ ├ JuMP.AffExpr in MOI.GreaterThan{Float64}: 230001
│ ├ JuMP.AffExpr in MOI.LessThan{Float64}: 15600000
│ └ JuMP.VariableRef in MOI.Integer: 8970676
└ Names registered in the model
  └ :p, :pA, :pf, :r, :rint, :u, :ur, :v, :x, :ϖ