Parameter Structure in Array

I’m conducting a set of simulations where each scenario has a set of parameters that can vary. These are scalars (e.g., random number seed, sample size, dropout rate, number of simulations), vectors (e.g., a set of evaluation times), and matrices (e.g., event type by survival time per stage of event type). The scenarios are not necessarily regular in organization, and the number and type of parameters is evolving over time.

Currently, the scenarios are indexed in a CSV file, one scenario per line. This is read in and parsed as needed (e.g., reading a matrix in based on a filename in the CSV file). I have been using a DataFrame to hold each parsed row, but I think it would be useful to switch to a structure. Then, instead of do_one(param1, param2, param3, etc.) I can call do_one(scenario). This will be make it easier to drag all the parameters around to each function, and I think that this will be a lot more flexible to change.

The current loop looks like:

for i in 1:n ### number of scenarios
for j in 1:m ### number of simulations
result = do_one(param1, param2, param3, etc.) ### to become do_one(scenario)
### do stuff with result
end
end

The next step will be to parallelize this effort, probably at the first loop as a starting point.

Generally, how can I set up an array of parameter structures (using, e.g., Parameters.jl), and how can I can access it correctly inside the first loop. For example, could I do something like: scenario = scenarios[i], where scenarios is my array of parameter structures that I cobble together from the CSV input?

This is a simple way of setting up the parameter array (without Parameters.jl) from a sample csv file (loaded from memory so the snippet is auto-contained).

julia> using CSV

julia> data = """
       x,y,z
       1,2,3
       4,5,6
       """;  # Sample data as if within a CSV file

julia> csv = CSV.File(IOBuffer(data))  # Load sample data from memory
2-element CSV.File{false}:
 CSV.Row: (x = 1, y = 2, z = 3)
 CSV.Row: (x = 4, y = 5, z = 6)

julia> struct A
           x
           y
           z
       end

julia> A(r::CSV.Row) = A((r[k] for k in fieldnames(A))...)  # Constructor loads row by name
A

julia> jobs = [A(row) for row in csv] 
2-element Array{A,1}:
 A(1, 2, 3)
 A(4, 5, 6)

You could loop over the array objects directly:


julia> function simulation(job)
           println("This is the simulation with $job where job.x=$(job.x)")
       end
simulation (generic function with 1 method)

julia> for job in jobs
           simulation(job)
       end
This is the simulation with A(1, 2, 3) where job.x=1
This is the simulation with A(4, 5, 6) where job.x=4

1 Like

Its much better using structs than individual parameters - you’re on the right track.

You can use Flatten.j to update the parameters in your structs from a Dataframe row. You can also flatten the struct parameter names and values to build dataframes to write back to CSV.

This solution is general, it will work for any model structure (even deeply nested structs) and you can ignore fields that aren’t parameters fairly easily.

2 Likes

These are both super helpful answers! I’m trying both approaches out and these also helped me to understand how Parameters.jl works.

1 Like