Discrete event simulation framework for discrete time

manuka · February 17, 2021, 2:48am

I would like to receive feedback on my simulation model. I have posted it in codereview.stackexchange but recieved almost no feedback (Discrete event simulation framework for discrete times in Julia - Code Review Stack Exchange).

Here is the post:

I would like to receive constructive feedback on my simulation framework (this is for my thesis work). I have no formal background in programming, and I am wondering if there is a more effective and efficient way to structure my codes for better readability and scalabiilty so that people can easily understand, modify, and build on my codes while achieving high computational efficiency.

Here is a brief description of my simulation model. For each agent I create in my simulation world, they go through a number of event processes annually until they die, reach the maximum age (e.g., 111), or reach the max. time window (e.g., 2050 if an agent is created in 2015 and I set the time window to be 35). I am using a modular apporach, in which each module contains at least one process (so far I have not encoutered more than 1, but I think I would in the future): a process function for that module and input parameters for that process function.

To facilliate that framework, I have created an abstract type Simulation_Module under which event modules (birth and death) are contained.

abstractModule.jl:

abstract type Agent_Module end

abstract type Simulation_Module end

# sub modules
abstract type Birth_Module <: Simulation_Module  end
abstract type Death_Module <: Simulation_Module  end

agent.jl: Here is an agent module

mutable struct Agent{Z<:Bool,Y<:Int64}  <: Agent_Module
    sex::Z
    age::Y
    birth_year::Y
    cal_year::Y
    alive::Z
end

function set_agent!(ag, sex, age, birth_year,cal_year,alive)
    ag.sex = sex;
    ag.age= age;
    ag.birth_year = birth_year;
    ag.cal_year = cal_year;
    ag.alive = alive;
    nothing
end

agent = Agent(false,0,2015,2015,true)

birth.jl: Birth module

mutable struct Birth <: Birth_Module
    parameters::Union{AbstractDict,Nothing}
    process::Function
end

parameter_names = nothing

function birth_process(cal_year_index::Int64,parameters::Union{AbstractDict,Nothing}=nothing)::Bool
    rand(Bernoulli(0.5))
end

birth = Birth(Dict_initializer(parameter_names),birth_process)

death.jl

mutable struct Death <: Death_Module
    parameters::Union{AbstractDict,Nothing}
    process::Function
end

parameter_names = nothing

function death_process(ag::Agent,parameters::Union{AbstractDict,Nothing}=nothing)::Bool
    rand(Bernoulli(0.5))
end

death = Death(Dict_initializer(parameter_names),death_process)

Simulation.jl: It contains a struct for Simulation and a run function for running a simulation.

using Setfield, Distributions, StatsFuns
using TimerOutputs

function Dict_initializer(parameter_names::Union{Nothing,Vector{Symbol}})
    isnothing(parameter_names) ? nothing : Dict(parameter_names .=> missing)
end

# abstract types
include("abstractModule.jl")
# agent
include("agent.jl")
# event proceses
include("birth.jl")
include("death.jl")


mutable struct Simulation <: Simulation_Module
    max_age::Int64
    starting_calendar_year::Int64
    time_horizon::Int64
    n::Int64 # number of agents to create in each year
    Agent::Agent_Module
    Birth::Birth_Module
    Death::Death_Module
    OutcomeMatrix::AbstractDict # needs to be a dictionary
end


function run(simulation::Simulation_Module)

    # Is it better to define the fields of the simulation object
    # separately if I am going to use
    # them in simulation multiple times?
    max_age::Int64 = simulation.max_age
    min_cal_year::Int64 = simulation.starting_calendar_year
    max_cal_year::Int64 = min_cal_year + simulation.time_horizon - 1
    simulation.n = (isnothing(simulation.n) ? 100 : simulation.n)
    n::Int64 = simulation.n

    max_time_horizon::Int64 = simulation.time_horizon
    cal_years::Vector{Int64} = collect(min_cal_year:1:max_cal_year)

    # store events
    # total num of agents
    n_list = zeros(Int64,simulation.time_horizon,2)
    # store events by cal year, age, and sex for each event type
    event_list::Vector{String} = ["death","alive"]
    event_matrix = zeros(Int64,length(cal_years),max_age,2,length(event_list))

    # time the performance
    to = TimerOutput()

    # initiate variables
    tmp_n::Int64 = 0
    tmp_cal_year_index::Int64 = 0

    @timeit to "sleep" sleep(0.02)
    for cal_year in cal_years
        # for each calendar year
        @timeit to "calendar year $cal_year" begin
            tmp_cal_year_index = cal_year - min_cal_year + 1

            for i in 1:n
                # initialization: create a new agent
                set_agent!(simulation.Agent,simulation.Birth.process(tmp_cal_year_index),0,tmp_cal_year_index,tmp_cal_year_index,true)

                # update the total number of agents added to the simulation
                n_list[tmp_cal_year_index,simulation.Agent.sex+1] +=1

                # go through event processes for each agent until
                # death or max age or max time horizon
                while(simulation.Agent.alive && simulation.Agent.age <= max_age && simulation.Agent.cal_year <= max_time_horizon)

                    # I have other event processes

                    # death - the last process
                    if simulation.Death.process(simulation.Agent)
                        # dead
                        simulation.Agent.alive = false
                        event_matrix[simulation.Agent.cal_year,simulation.Agent.age+1,simulation.Agent.sex+1,1] += 1
                    else
                        # still alive!
                        event_matrix[simulation.Agent.cal_year,simulation.Agent.age+1,simulation.Agent.sex+1,2] += 1
                        simulation.Agent.age += 1
                        simulation.Agent.cal_year += 1
                    end

                end # end while loop

            end # end for loop: agents

        end # end of begin timeit

    end # end for loop: cal year
    print_timer(to::TimerOutput)

    # convert the event matrix into a dictionary
    tmp_event_dict = Dict()
    for jj in 1:length(event_list)
        tmp_event_dict[event_list[jj]] = [event_matrix[:,:,1,jj],event_matrix[:,:,2,jj]]
    end

    # reshape event matrix into a dictionry of list of matrices
    simulation.OutcomeMatrix = Dict("n" => n_list,
                                    "outcome_matrix" => tmp_event_dict)

    print("\n Simulation finished. Check your simulation object for results.")
    return nothing
end

# test run
simulation = Simulation(111,2015,25,5000, agent, birth, death,Dict());

run(simulation)

You can directly download the files from:
https://www.dropbox.com/sh/nrip5dc2rus6oto/AADIlZsngwjuUOwir9O2AeMsa?dl=0

I have just provided two modules, and my run function is very simple but you can imagine I would have many more processes and hence a complicated run function.

Besides any feedback you may have, I would like to know whether there is a better (in terms of readability, scalability, and efficiency) frameworks/tools I could use (e.g., software-wise, structure of “modules”, julia types, built-in functions, etc).

If anything is unclear please let me know. Thanks!

Skoffer · February 17, 2021, 8:57am

Here are some notes, maybe they’ll be of some help.

First of all some general comments.

It would be nice to put this code at some git repository, like github.com or gitlab.com. It makes the process of introducing changes, getting feedback, and so on much much easier. Maybe you already did it, but if not, it’s worth some time investment. It’s a huge win in a long run and it is better to start use from the beginning.
There is a Julia framework for this sort of things, named Agents.jl. I haven’t used it myself, but it is very popular and probably it’s better to use it, since it has established community, developers and some tooling, like plotting facilities for simulation profiling.

Some notes regarding the code itself.

If it is possible, it is better not to use the word module in type names. It is not prohibited, but since Modules have a very special meaning of its own, it’s better not to confuse code readers. Maybe use something like AbstractSimulation instead of Simulation_Module?
Bool and Int64 are concrete types, so Agent definition is extraneous. You can just define it as

mutable struct Agent  <: Agent_Module
    sex::Bool
    age::Int
    birth_year::Int
    cal_year::Int
    alive::Bool
end

Code is over annotated. Unlike C or other typed languages, you do not need to annotate variables everywhere, quite the contrary. It can be hard psychologically, but still, you should try it: remove all annotations in function declarations and reintroduce them only when needed. So, for example birth_process defined as

function birth_process(cal_year_index, parameters)
...

and inside run function

function run(simulation)
    max_age = simulation.max_age
    min_cal_year = simulation.starting_calendar_year
    max_cal_year = min_cal_year + simulation.time_horizon - 1
    simulation.n = (isnothing(simulation.n) ? 100 : simulation.n)
    n = simulation.n
...
end

and so on. Extra annotation does not help the compiler, but makes the code less readable. Of course, it should be reintroduced for multiple dispatch, but only when multiple dispatch is being used. I.e. you should apply KISS principle.

Maybe somewhat complicated advice: it’s better to avoid mutable structures if possible. In practice, immutable structures and change them with @set/@set! macros from Setfield.jl package. It may require some changes in your code, but it’s doable and makes your life easier in the long run.
Death and Birth structures are nonidiomatic. Generally, you shouldn’t put functions inside structures, instead you should use multiple dispatch. So Death structure should be rewritten as follows

struct Death <: Death_Module
    parameters::Dict
end

function process(a::Agent, d::Death)
  rand(Bernoulli(0.5))
end

and inside run function instead of

if simulation.Death.process(simulation.Agent)

you can write

if process(simulation.Agent, simulation.Death)

and same goes for Birth structure and related functions.

One more readability issue: it’s not very convenient that field names in Simulation have the same names as types (I am referring to Agent, Death, Birth fields). As a common convention, field names are written from the small letter, to distinguish them from type names.

manuka · February 17, 2021, 10:04pm

Thanks for your advice and suggestions. I really appreciate them. I will go over these carefully .

manuka · March 19, 2021, 8:49am

Could you explain why I should avoid immutable structure? I find that the code runs faster with less memory using mutable structures.

Skoffer · March 19, 2021, 8:56am

I’ve made a comparison not long ago: Performance disadvantages of structs? - #12 by Skoffer

Generally if you do everything correctly then immutable structures should work faster than mutable. If you see the opposite, usually it’s some non-idiomatic usage.

And of course, this distinction immutable/mutable only applies to small objects (like Agent in your examples). If you have hundreds of agents, they should be contained in mutable Vector/Matrix, because that’s what Vector is for.