I’m trying to run some sport game simulations and programmatically store the results in the most efficient manner. Even if not strictly necessary for my purpose, I am trying to learn how to do right.
Right now my order of concern here is: Memory > Readability > Speed. I came up with this, which is roughly how I would start out in R using a nested list:
###########################
#### Desired Structure ####
###########################
# -sim1
#   -metrics = {repeat(Float32, 3), Bool}(1x4) 
# 	-lgstats
# 		-rep1 = {Int16, String15, repeat(Int32, 8)}(4x10)
# 		-rep2 = {Int16, String15, repeat(Int32, 8)}(4x10)
#       -[...]
# 	-team1
# 		-pinfo  = {String15, Int8, String3, String3}(30x4)
# 		-params = {Int8}(30x6)
# 		-pstats
# 			-rep1 = {Int16}(30x7)
# 			-rep2 = {Int16}(30x7)
#           -[...]
# 	-team2
# 		-[...]
#   -team3
#       -[...]
#   -team4
#       -[...]
# -sim2
# 	-[...]
# -sim3
#   -[...]
Each simulation (sim) uses a unique set of player parameters/ratings (params), and is replicated (rep) multiple times per sim step (the output is stochastic).
You can see there are various metrics (rmse, etc) and league-level stats (lgstats). Then for each team there is player info (pinfo) like name/etc, along with the player params and stats (pstats).
After much messing around, I managed to create this:
using DataFrames, InlineStrings
# Variable
nsim  = 3
nrep  = 2
nteam = 4
# Constant (see desired structure)
nmetrics = 4
nplayer  = 30
nlgstats = 10
nparams  = 6
npstats  = 7
# Team-level Tuple
teamres = Tuple{DataFrame,              # pinfo
                DataFrame,              # params
                NTuple{nrep, DataFrame} # pstats,  rep[1:nrep]
                }                       # team1
# Simulation-level Tuple
simres = Tuple{DataFrame,               # nmetrics
               NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
               NTuple{nteam, teamres}   # teams,   team[1:nteam]
               }                        # sim1
# Final Result
allres = NTuple{nsim, simres}
When run, it seems to work:
Summary
julia> # Team-level Tuple
       teamres = Tuple{DataFrame,              # pinfo
                       DataFrame,              # params
                       NTuple{nrep, DataFrame} # pstats,  rep[1:nrep]
                       }                       # team1      
Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}
       
julia> # Simulation-level Tuple
       simres = Tuple{DataFrame,               # nmetrics
                      NTuple{nrep, DataFrame}, # lgstats, rep[1:nrep]
                      NTuple{nteam, teamres}   # teams,   team[1:nteam]
                      }                        # sim1       
Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}
      
julia> # Final Result
      allres = NTuple{nsim, simres}
Tuple{Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}, Tuple{DataFrame, Tuple{DataFrame, DataFrame}, NTuple{4, Tuple{DataFrame, DataFrame, Tuple{DataFrame, DataFrame}}}}}
But I cannot figure out how to:
- 
Name the elements of these tuples (I could not get NamedTuple{} to work here). 
- 
Preallocate dataframes of the desired sizes and types. 
- 
Programmatically add my data to this DataType I created. 
And perhaps this is the totally wrong way to go about it. Please tell me if so, because I don’t know what I’m doing here. But even then, I would be interested in knowing how to make this method work (or why it won’t).