Global sensitivity analysis in a large parameter space - computational cost

Nick1 · January 26, 2023, 10:22pm

Hi everyone,

I have been working in a large biochemical reaction network and I try to perform Global sensitivity analysis (GSA) (see my last post). I was wondering if there are any advice, or suggestions regarding the efficiency of GSA when the parameter space is large. For example, in my case, I have around 120 parameters. What would be appropriate sampling and would be the most efficient method for such a large of parameters?

Thank you for your time and effort!

Vaibhavdixit02 · January 27, 2023, 5:01am

I think a combination of using batching as described here Parallelized Morris and Sobol Sensitivity Analysis of an ODE · GlobalSensitivity.jl and the eFAST method should be ideal for such a case

ChrisRackauckas · January 27, 2023, 9:09am

eFAST with batching and setup the batches to be handled via GPU parallelism if possible.

Nick1 · January 27, 2023, 6:36pm

I tried eFAST but I get this error:

DivideError: integer division error

The Sobol method does not give an error but it is slow.

Nick1 · January 27, 2023, 6:37pm

eFAST gives the following error, while Sobol does not, but it is slow.

DivideError: integer division error

ChrisRackauckas · January 28, 2023, 8:53am

Can you share an MWE for us to look at with the eFAST error?

Yes, Sobol is robust but slow. That’s pretty much by definition of the method

Nick1 · January 30, 2023, 6:10pm

Can I share it with you on the slack channel?

Vaibhavdixit02 · January 31, 2023, 4:54am

Yes please, post it in either sciml-bridged or diffeq-bridged channels

TS-CUBED · April 28, 2023, 10:23am

I’m highjacking this thread since we have a very similar problem and I don’t want to split the discussion up too much.

We run Sobol analysis on a large-ish ODE-model in parallel on a 24 core machine. The batch processing of the individual ODE solutions works great and scales wonderfully, but we found two issues/bottlenecks:

Memory usage goes up massively. I assume that’s because all the results get stored in memory for the Sobol index calculation to work on them most efficiently. For the moment I manage to get the case in 128G or 256G RAM with ZRAM compression (which doesn’t seem to cost too much overhead interestingly).
Only the ODE solvers run in parallel batch mode. On our models I found that thats about 80-90 percent of the workload, while 10-20% seems to be the Sobol index calculation. In practice that means that on a typical small problem on my laptop I have a 1:1 ratio between parallel and serial wall-clock time (toy test case just clocked up 13.6 CPU-minutes in 7 wall-clock minutes: 50 seconds wct parallel on 8 cores, and then another 7 minutes on a single thread. On the big machines, on a bigger problem, I had 1h wct on 24 cores and expect another 24h wct on a single core to get the estimated 48h CPU time (currently at 28h CPU time - so we’ll see tomorrow how good my estimates are and if it all scales as I naively expect).

Ad 2.: is there a way to run the Sobol analysis in parallel as well?

Ad 1.: is there a way to write the results object to disk and then have the Sobol analysis read that back in while it calculates the indices? I don’t know the algorithm, so don’t know if it is possible to sequentialise that enough to make that work. Or do I need to ask for a 1TB RAM machine?

ChrisRackauckas · May 11, 2023, 8:22am

Open an issue. It can be done, it’s not implemented. It could be mmaped.

That could be done. That would need an issue and some work in the package.

TS-CUBED · May 11, 2023, 9:42am

Ad 1:

I tested this a bit more and it seems that the memory allocation and deallocation adds considerable overhead (the whole GSA process allocated around 200GB overall). So I have changed from solving a single Ensemble to solving chunks of the parameter space at a time and reassembling the output array afterwards (code is not yet cleaned up but does work).

# Define GSA solver function:

ensemble_chunked = function (p)

  global parameterset = p

  Np::Int64 = size(p,2)

  println("Running ", Np, " cases.")

  chunksize::Int64 = Np/chunks

  println("Using ", chunks, " chunk(s) of size ", chunksize, ".")


  out = zeros(3,Np)

  for k in 1:chunks
      offset = Int((k - 1) * Np/chunks)
      startindx = Int(offset + 1)
      endindx = Int(k * Np/chunks)

      println("Starting chunk ", k, ", from ", startindx, " to ", endindx, " with offset ", offset, ".")

      pchunk = p[:,startindx:endindx]

      prob_func(prob,i,repeat) = remake(prob; u0 = [LV_Vt0, LA_Vt0, pt0sas, qt0sas , pt0sat, qt0sat, pt0svn], p=pchunk[:,i])

      ensemble_prob = EnsembleProblem(prob,prob_func=prob_func)

      @time "    ODE Ensemble solved in " sol = solve(ensemble_prob,GPUTsit5(), reltol=1e-8, abstol=1e-8, EnsembleThreads();saveat = saveat,trajectories=chunksize)

      @time "    Results processed in " Threads.@threads for i in 1:chunksize
          out[1,i+offset] = whatever_measure1(sol[i])
          out[2,i+offset] = whatever_measure2(sol[i])
          out[3,i+offset] = whatever_measure2(sol[i])
      end
  end
  out
end

# Set up Sobol solution:

chunks:Int64 = 100

N = 20000

lb = [5.0, 1.0, 0.07, 1.75, 1.0, 0.21, 0.360, 0.0, 4.0, 1.0, 0.105, 0.175, 1.0, 0.0315, 0.063, 0.92, 0.0231, 0.042, 0.0, 0.056, 0.0021, 4.34e-5, 0.0, 1.12, 0.035, 0.00119, 0.35, 0.364, 0.0525, 0.0, 14.35]
ub = [5.0, 1.0, 0.13, 3.25, 1.0, 0.34, 0.585, 0.0, 4.0, 1.0, 0.165, 0.325, 1.0, 0.0585, 0.117, 0.92, 0.0429, 0.078, 0.0, 0.104, 0.0039, 8.06e-5, 0.0, 2.08, 0.065, 0.00221, 0.65, 0.676, 0.0975, 0.0, 26.65]
bounds = tuple.(lb,ub)
sampler = SobolSample()
A,B = QuasiMonteCarlo.generate_design_matrices(N, lb, ub, sampler)

# And call it:

Sobol_JS_C = gsa(circ_time_post,Sobol(),A,B,batch = true)

With this I went from 100h to 1h wall-clock-time on a 24-core node. Which is pretty close to estimated based on single ODE solver runs, so scales pretty well.

Ad 2:

I’m not sure if that’s required anymore. I had assumed that the long run-time was somehow related to the Sobol calculations, but it seems it dropped down to single-core for the memory management tasks - which strace confirmed.

With the chunked approach I get the Sobol indices calculated in a few seconds, which is negligibly small relative to the ODE solver runtime.

TS-CUBED · May 11, 2023, 5:22pm

Just tested this a bit more by reducing the number of trajectories so the whole GSA fits in less than 1/2 the memory.

2500 trajectories on 24 threads (12 cores),

SIngle chunk: 730 seconds for the Sobol analysis

100 chunks: 557 seconds for the same analysis

That’s for an ODAE system 8+1 equations, 31 parameters. Solution time for the baseline parameter set: 82 ms.

So that’s 33*2500 = 82500 runs of the ODE system, which should take 6765 seconds.

Perfect scaling on 24 threads would give 282 seconds.

I’ll see what happens for smaller chunks (just started a 500 chunk run**) and if I get convergence

But so far it seems that even without memory over-commit (I switched all swap off for this test) many smaller Ensembles seem to scale better than one big one. At least in this case, where the solution object needs to keep hundreds of thousands of ODE solutions in memory.

I get 37% gc time on the full ensemble run vs 12.7% gc on the chunked run*.

* I’m not sure if I can trust the % calcs, though, since the compile times seem to be in CPU time vs wall-clock time: I get values like 1500% compile time on first runs

** Edit: 500 chunks did only speed it up very slightly. gc time dropped to 10% and run-time to 568 seconds.

Nick1 · May 17, 2023, 7:54pm

I also want to mention that faced the same problem regarding memory usage. I was running the analysis in a cluster but the job exited because of the large usage of memory. Should we open an issue?

ChrisRackauckas · May 17, 2023, 8:42pm

Yes open an issue. It can get mmaped.

Nick1 · May 18, 2023, 7:29pm

Thanks. I opened two different issues regarding Memory-mapping and parallelization.

duodenum · November 26, 2024, 9:05pm

Came here from the future to thank for this code example, it is a lifesaver. Tangentially related, I am having the following problem in a large parameter space (maybe I should open up another thread for this). Let’s say I have two parameters A and B. If I constrain the range of A between 0 and 10 and B between 0 and 5, the sensitivity to A turns out higher. If I do the inverse, sensitivity to B is higher. If I make both of them between 0 and 10, then the solutions to differential equation diverge. I tried putting NaNs to divergent solutions but this impairs the subsequent calculations of sensitivity analysis since the software is not optimized for dealing with NaN values. I can create a minimum working example if needed. Does anybody dealt with a similar problem? I am open for any suggestions.

Topic		Replies	Views
Global sensitivity analysis - following time series Modelling & Simulations	15	1257	March 18, 2020
Parallelize GSA on continuous outputs Modelling & Simulations diffeq	12	396	March 16, 2023
Why does the Sobol Sensitivity (S1) so large in my model General Usage question , package , globalsensitivityjl	18	199	April 2, 2025
Morris Vs Sobol Method Global Sensitivity analysis Statistics	25	2446	July 7, 2022
Global Sensitivity Analysis for an ODE model where not all parameter sets give a result Modelling & Simulations question	3	433	February 17, 2022

Global sensitivity analysis in a large parameter space - computational cost

Related topics