I’m trying to run some sport game simulations and programmatically store the results in the most efficient manner. Even if not strictly necessary for my purpose, I am trying to learn how to do right.
Right now my order of concern here is: Memory > Readability > Speed
. I came up with this, which is roughly how I would start out in R using a nested list:
###########################
#### Desired Structure ####
###########################
# sim1
# metrics = {repeat(Float32, 3), Bool}(1x4)
# lgstats
# rep1 = {Int16, String15, repeat(Int32, 8)}(4x10)
# rep2 = {Int16, String15, repeat(Int32, 8)}(4x10)
# [...]
# team1
# pinfo = {String15, Int8, String3, String3}(30x4)
# params = {Int8}(30x6)
# pstats
# rep1 = {Int16}(30x7)
# rep2 = {Int16}(30x7)
# [...]
# team2
# [...]
# team3
# [...]
# team4
# [...]
# sim2
# [...]
# sim3
# [...]
Each simulation (sim
) uses a unique set of player parameters/ratings (params
), and is replicated (rep
) multiple times per sim step (the output is stochastic).
You can see there are various metrics
(rmse, etc) and leaguelevel stats (lgstats
). Then for each team
there is player info (pinfo
) like name/etc, along with the player params
and stats (pstats
).
After much messing around, I managed to create this:
using DataFrames, InlineStrings
# Variable
nsim = 3
nrep = 2
nteam = 4
# Constant (see desired structure)
nmetrics = 4
nplayer = 30
nlgstats = 10
nparams = 6
npstats = 7
# Teamlevel Tuple
teamres = Tuple{DataFrame, # pinfo
DataFrame, # params
NTuple{nrep, DataFrame} # pstats, rep[1:nrep]
} # team1
# Simulationlevel Tuple
simres = Tuple{DataFrame, # nmetrics
NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
NTuple{nteam, teamres} # teams, team[1:nteam]
} # sim1
# Final Result
allres = NTuple{nsim, simres}
When run, it seems to work:
Summary
julia> # Teamlevel Tuple
teamres = Tuple{DataFrame, # pinfo
DataFrame, # params
NTuple{nrep, DataFrame} # pstats, rep[1:nrep]
} # team1
Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}
julia> # Simulationlevel Tuple
simres = Tuple{DataFrame, # nmetrics
NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
NTuple{nteam, teamres} # teams, team[1:nteam]
} # sim1
Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}
julia> # Final Result
allres = NTuple{nsim, simres}
Tuple{Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}}
But I cannot figure out how to:

Name the elements of these tuples (I could not get NamedTuple{} to work here).

Preallocate dataframes of the desired sizes and types.

Programmatically add my data to this DataType I created.
And perhaps this is the totally wrong way to go about it. Please tell me if so, because I don’t know what I’m doing here. But even then, I would be interested in knowing how to make this method work (or why it won’t).