I’m trying to run some sport game simulations and programmatically store the results in the most efficient manner. Even if not strictly necessary for my purpose, I am trying to learn how to do right.
Right now my order of concern here is: Memory > Readability > Speed
. I came up with this, which is roughly how I would start out in R using a nested list:
###########################
#### Desired Structure ####
###########################
# -sim1
# -metrics = {repeat(Float32, 3), Bool}(1x4)
# -lgstats
# -rep1 = {Int16, String15, repeat(Int32, 8)}(4x10)
# -rep2 = {Int16, String15, repeat(Int32, 8)}(4x10)
# -[...]
# -team1
# -pinfo = {String15, Int8, String3, String3}(30x4)
# -params = {Int8}(30x6)
# -pstats
# -rep1 = {Int16}(30x7)
# -rep2 = {Int16}(30x7)
# -[...]
# -team2
# -[...]
# -team3
# -[...]
# -team4
# -[...]
# -sim2
# -[...]
# -sim3
# -[...]
Each simulation (sim
) uses a unique set of player parameters/ratings (params
), and is replicated (rep
) multiple times per sim step (the output is stochastic).
You can see there are various metrics
(rmse, etc) and league-level stats (lgstats
). Then for each team
there is player info (pinfo
) like name/etc, along with the player params
and stats (pstats
).
After much messing around, I managed to create this:
using DataFrames, InlineStrings
# Variable
nsim = 3
nrep = 2
nteam = 4
# Constant (see desired structure)
nmetrics = 4
nplayer = 30
nlgstats = 10
nparams = 6
npstats = 7
# Team-level Tuple
teamres = Tuple{DataFrame, # pinfo
DataFrame, # params
NTuple{nrep, DataFrame} # pstats, rep[1:nrep]
} # team1
# Simulation-level Tuple
simres = Tuple{DataFrame, # nmetrics
NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
NTuple{nteam, teamres} # teams, team[1:nteam]
} # sim1
# Final Result
allres = NTuple{nsim, simres}
When run, it seems to work:
Summary
julia> # Team-level Tuple
teamres = Tuple{DataFrame, # pinfo
DataFrame, # params
NTuple{nrep, DataFrame} # pstats, rep[1:nrep]
} # team1
Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}
julia> # Simulation-level Tuple
simres = Tuple{DataFrame, # nmetrics
NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
NTuple{nteam, teamres} # teams, team[1:nteam]
} # sim1
Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}
julia> # Final Result
allres = NTuple{nsim, simres}
Tuple{Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}}
But I cannot figure out how to:
-
Name the elements of these tuples (I could not get NamedTuple{} to work here).
-
Preallocate dataframes of the desired sizes and types.
-
Programmatically add my data to this DataType I created.
And perhaps this is the totally wrong way to go about it. Please tell me if so, because I don’t know what I’m doing here. But even then, I would be interested in knowing how to make this method work (or why it won’t).