Running files multiple times and saving output of each instance

If I have a file I would like to run multiple times I know I could use an include statement within a for-loop.

I would like to run such a file say N times, and for each iteration I would like to save the output. What would be an efficient manner to do so and what is an effective way to save the output each time without overwriting previous output?

If you are doing simulations, you could maybe consider using Introduction · DrWatson, which helps with this task and more.

I personally would recommend wrapping whatever you want to do into a function and calling that function N times, maybe with a Threads.@threads for loop, such that it runs in parallel.
One advantage of wrapping everything into a function is also that it avoids difficulties with global variables etc which can at times also improve the performance.

The setup would look like this

# Pseudocode for simulation.jl
using Random
function simulate(p, seed = 0)
   # do your stuff...
   (;alpha, beta, n) = p
   Random.seed!(seed)  # set random seed, if you simulation uses random values...

   return @. alpha * rand(n) + p.beta * randn(n)
end

and then the main file could look like this:

# Pseudocode for  run_parallel.jl
include("simulation.jl")
using ProgressMeter

p = ( alpha = 10, beta = 20, n = 100)   # your parameters for the simulation...

N = 100
prog = Progress(N)

Threads.@threads for seed in 1:N
    results = simulate(p, seed)
    save("output_$(seed).csv", results)  # Pseudocode, call whatever is needed to save your output...
    next!(prog)  # update progress meter
end

I added a few more things here:

  1. Make sure you initialise the random seed to make things reproducible.
  2. Avoid local variables, but instead use some kind of object to store all parameters and pass it to the function. That is very useful to make things fast!
  3. I find it useful to use ProgressMeter.jl to keep track of the progress… If you simulation is stuck it can save you a lot of time when you read that it would take years to finish…
  4. Keep things modular, if you have a function to simulate and one to save results, you can mix and match these, for example if you also want on script which does something else with your simulation.
2 Likes

This is very helpful, which opens up another question.

Currently I use one file to generate data which is saved to a CSV and then another file to read that csv and compute an estimator of the parameters. Each time the initial file runs the parameters are generated randomly.

Could I wrap this all into a single function?

1 Like

Yes, it would make much sense to skip that intermediate step…

One quick way is to use NamedTuples or DataFrames to represent the data within Julia.
You could do something like data = generate_data(p, seed) and then param_est = estimate_parameters(p, data).

So you could make use two functions if it makes sense, but you could also wrap everything into one function.

1 Like