ANN: MetaFields.jl and Flatten.jl - packages for writing composable models


#1

Iโ€™ve been putting together a few biophysical models over the last year, and have constantly encountered what I feel to be the core software problem in my field: sharing high performance code in general ways never happens. This means there is no real software ecosystem, a huge amount of cut and paste, and massive monolithic architectures. Its a social problem with a technical cause.

One solution to this is composition of model parameters into nested heriarchies, using generic types, and method dispatch to run custom functions depending on the types. Julia is amazing for this, and I have a proof of concept: DynamicEnergyBudgets.jl and Photosynthesis.jl.

Why Flatten and MetaFields?

The problem with using composition is that numerical tools need parameters to be provided in flat vectors, not nested in hierarchies. Bayesian methods need not only the parameters to be flat, but also priors for each parameter!

I want all of that to compose so running multiple model combinations can be automated, without writing any boilerplate conversion code. I also want to supply defaults for everything, but allow easy user overrides.

The solution I have ended up with is a way of flattening heterogeneous structs and metadata about them intto vectors:


MetaFields lets you attach meta data to fields, just like tags in go but more flexible. It lets you choose which fields to flatten, and attach extra data like priors to each field, that are not actually included on the struct.

Flatten.jl will flatten nested type hierarchies to vectors or tuples, and rebuild them, as an intermediary between a nested model and say, DiffEqSensitivity, to insert Dual numbers into the nested types. Credit goes to Robin Deits and Jan Weidner for a lot of the ideas, the original code minus MetaFields integrations was written by Deits.

The two key features of flatten is that you can skip fields in the hierarchy using a @flattenable metafield, and you can flatten metafields and other properties such as field name, or parent struct name into vectors.

Any comments or improvements would be really helpful before I put these into METADATA, Iโ€™m also going to discuss some of the more general ideas in a paper and conference in the next few months so feedback would be really appreciated.

As an this example, Iโ€™ll take a hierarchical, nested data type with metafield markup (example source), and flatten chosen fields and metadata about them to a dataframe, with intermixed parameters and metafields from multiple packages and custom overrides:

julia> organism = DynamicEnergyBudgets.FvCBPlant();                                                                    
                                                                                                                       
julia> params = organism;                                                                                              
                                                                                                                       
julia> def = Flatten.flatten(Vector, params);                                                                                                               
                                                                                                                       
julia> pri = metaflatten(Vector, params, DynamicEnergyBudgets.prior);                                                  
                                                                                                                       
julia> fnames = metaflatten(Vector, params, fieldname_meta);                                                           
                                                                                                                       
julia> unit = metaflatten(Vector, params, DynamicEnergyBudgets.units);                                                 
                                                                                                                       
julia> form = metaflatten(Vector, params, fieldparent_meta);                                                           
                                                                                                                       
julia> lab = metaflatten(Vector, params, DynamicEnergyBudgets.label);                                                  
                                                                                                                       
jjulia> t = DataFrame(Formulation=form, Name=fnames, Default=def, Units=unit, Prior=pri, Label=lab)                                                                                                         
82ร—6 DataFrames.DataFrame. Omitted printing of 1 columns                                                                                                                                                     
โ”‚ Row โ”‚ Formulation                  โ”‚ Name        โ”‚ Default  โ”‚ Units          โ”‚ Prior                                           โ”‚                                                                           
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                                                                           
โ”‚ 1   โ”‚ YingPingRadiationConductance โ”‚ rdfipt      โ”‚ 1.0      โ”‚                โ”‚                                                 โ”‚                                                                           
โ”‚ 2   โ”‚ YingPingRadiationConductance โ”‚ tuipt       โ”‚ 1.0      โ”‚                โ”‚                                                 โ”‚                                                                           
โ”‚ 3   โ”‚ YingPingRadiationConductance โ”‚ tdipt       โ”‚ 1.0      โ”‚                โ”‚                                                 โ”‚                                                                           
โ”‚ 4   โ”‚ BoundaryConductance          โ”‚ leafwidth   โ”‚ 0.05     โ”‚ m              โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=0.025)    โ”‚                                                                           
โ”‚ 5   โ”‚ BallBerryStomatalConductance โ”‚ gamma       โ”‚ 0.0      โ”‚ ฮผmol mol^-1    โ”‚ Distributions.Gamma{Float64}(ฮฑ=1.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 6   โ”‚ BallBerryStomatalConductance โ”‚ g1          โ”‚ 7.0      โ”‚                โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=0.7)     โ”‚                                                                           
โ”‚ 7   โ”‚ PotentialSoilData            โ”‚ swpexp      โ”‚ 1.0      โ”‚                โ”‚                                                 โ”‚                                                                           
โ”‚ 8   โ”‚ BallBerryModel               โ”‚ g0          โ”‚ 0.03     โ”‚ mol m^-2 s^-1  โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=0.003)   โ”‚                                                                           
โ”‚ 9   โ”‚ Jmax                         โ”‚ jmax25      โ”‚ 184.0    โ”‚ ฮผmol m^-2 s^-1 โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=1.84)   โ”‚                                                                           
โ”‚ 10  โ”‚ Jmax                         โ”‚ delsj       โ”‚ 640.02   โ”‚ J K^-1 mol^-1  โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=6.4002) โ”‚                                                                           
โ”‚ 11  โ”‚ Jmax                         โ”‚ eavj        โ”‚ 37259.0  โ”‚ J mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=372.59) โ”‚                                                                           
โ”‚ 12  โ”‚ Jmax                         โ”‚ edvj        โ”‚ 200000.0 โ”‚ J mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=2000.0) โ”‚                                                                           
โ”‚ 13  โ”‚ NoOptimumVcmax               โ”‚ vcmax25     โ”‚ 110.0    โ”‚ ฮผmol m^-2 s^-1 โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=1.14)   โ”‚                                                                           
โ”‚ 14  โ”‚ NoOptimumVcmax               โ”‚ eavc        โ”‚ 47590.0  โ”‚ J mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=100.0, ฮธ=475.9)  โ”‚                                                                           
โ”‚ 15  โ”‚ BernacchiCompensation        โ”‚ Kc25        โ”‚ 404.9    โ”‚ ฮผmol mol^-1    โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=40.49)   โ”‚                                                                           
โ”‚ 16  โ”‚ BernacchiCompensation        โ”‚ Ko25        โ”‚ 278400.0 โ”‚ ฮผmol mol^-1    โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=27840.0) โ”‚                                                                           
โ”‚ 17  โ”‚ BernacchiCompensation        โ”‚ ฮ“โ˜†25        โ”‚ 42.75    โ”‚ ฮผmol mol^-1    โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=4.275)   โ”‚                                                                           
โ”‚ 18  โ”‚ BernacchiCompensation        โ”‚ ฮ”Ha_Kc      โ”‚ 79.43    โ”‚ kJ mol^-1      โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=7.943)   โ”‚                                                                           
โ”‚ 19  โ”‚ BernacchiCompensation        โ”‚ ฮ”Ha_Ko      โ”‚ 36.38    โ”‚ kJ mol^-1      โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=3.638)   โ”‚                                                                           
โ”‚ 20  โ”‚ BernacchiCompensation        โ”‚ ฮ”Ha_ฮ“โ˜†      โ”‚ 37.83    โ”‚ kJ mol^-1      โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=3.783)   โ”‚                                                                           
โ”‚ 21  โ”‚ BernacchiCompensation        โ”‚ tref        โ”‚ 25.0     โ”‚ ยฐC             โ”‚                                                 โ”‚                                                                           
โ”‚ 22  โ”‚ RubiscoRegen                 โ”‚ theta       โ”‚ 0.4      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=8.0, ฮฒ=12.0)      โ”‚                                                                           
โ”‚ 23  โ”‚ RubiscoRegen                 โ”‚ ajq         โ”‚ 0.324    โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=4.0, ฮฒ=8.3)       โ”‚                                                                           
โ”‚ 24  โ”‚ Respiration                  โ”‚ q10f        โ”‚ 0.67     โ”‚ K^-1           โ”‚                                                 โ”‚                                                                           
โ”‚ 25  โ”‚ Respiration                  โ”‚ dayresp     โ”‚ 1.0      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=5.0, ฮฒ=1.0)       โ”‚                                                                           
โ”‚ 26  โ”‚ Respiration                  โ”‚ rd0         โ”‚ 0.9      โ”‚ ฮผmol m^-2 s^-1 โ”‚ Distributions.Gamma{Float64}(ฮฑ=10.0, ฮธ=0.09)    โ”‚                                                                           
โ”‚ 27  โ”‚ Respiration                  โ”‚ tbelow      โ”‚ -100.0   โ”‚ ยฐC             โ”‚                                                 โ”‚                                                                           
โ‹ฎ                                                                                                                                                                                                            
โ”‚ 55  โ”‚ KooijmanArea                 โ”‚ M_Vscaling  โ”‚ 400.0    โ”‚ mol            โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=0.2)      โ”‚                                                                           
โ”‚ 56  โ”‚ SqrtAllometry                โ”‚ size        โ”‚ 0.1      โ”‚ m              โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=0.2)      โ”‚                                                                           
โ”‚ 57  โ”‚ Maturity                     โ”‚ j_E_rep_mai โ”‚ 0.001    โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 58  โ”‚ Maturity                     โ”‚ ฮบrep        โ”‚ 0.05     โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 59  โ”‚ Maturity                     โ”‚ M_Vrep      โ”‚ 10.0     โ”‚ mol            โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 60  โ”‚ Maturity                     โ”‚ w_M         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 61  โ”‚ Maturity                     โ”‚ n_N_M       โ”‚ 10.0     โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 62  โ”‚ Translocation                โ”‚ proportions โ”‚ 1.0      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 63  โ”‚ Params                       โ”‚ M_Vgerm     โ”‚ 0.01     โ”‚ mol            โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 64  โ”‚ Params                       โ”‚ ฮบsoma       โ”‚ 0.6      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 65  โ”‚ Params                       โ”‚ y_P_V       โ”‚ 0.02     โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 66  โ”‚ Params                       โ”‚ y_V_E       โ”‚ 0.7      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 67  โ”‚ Params                       โ”‚ y_E_ET      โ”‚ 0.8      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 68  โ”‚ Params                       โ”‚ y_EC_ECT    โ”‚ 1.0      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 69  โ”‚ Params                       โ”‚ y_EN_ENT    โ”‚ 1.0      โ”‚                โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 70  โ”‚ Params                       โ”‚ j_E_mai     โ”‚ 0.001    โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 71  โ”‚ Params                       โ”‚ j_P_mai     โ”‚ 0.01     โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 72  โ”‚ Params                       โ”‚ k_E         โ”‚ 0.2      โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 73  โ”‚ Params                       โ”‚ k_EC        โ”‚ 0.2      โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 74  โ”‚ Params                       โ”‚ k_EN        โ”‚ 0.2      โ”‚ dy^-1          โ”‚ Distributions.Beta{Float64}(ฮฑ=2.0, ฮฒ=2.0)       โ”‚                                                                           
โ”‚ 75  โ”‚ SharedParams                 โ”‚ n_N_N       โ”‚ 10.0     โ”‚                โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 76  โ”‚ SharedParams                 โ”‚ w_P         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 77  โ”‚ SharedParams                 โ”‚ w_V         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 78  โ”‚ SharedParams                 โ”‚ w_C         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 79  โ”‚ SharedParams                 โ”‚ w_N         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 80  โ”‚ SharedParams                 โ”‚ w_E         โ”‚ 25.0     โ”‚ g mol^-1       โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 81  โ”‚ SharedParams                 โ”‚ y_E_CH_NO   โ”‚ 1.5      โ”‚                โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚                                                                           
โ”‚ 82  โ”‚ SharedParams                 โ”‚ y_E_EN      โ”‚ 1.5      โ”‚                โ”‚ Distributions.Gamma{Float64}(ฮฑ=2.0, ฮธ=2.0)      โ”‚

#2

Thatโ€™s pretty cool. I may be building off of that to get some easily broadcastable structures.


#3

It looks very interesting!

It does seem like thereโ€™s a bit of overlap across different projects:

  • The Columns type from IndexedTables can turn vectors of tuples / named tuples / pairs (potentially nested) into a tuple or named tuple of arrays
  • StructofArrays can turn an array of structs into a struct of arrays (also works with nesting) but seems a bit unmaintained (though I may be wrong) and am not sure if it works on Julia master

At some point I had created a small StructArrays package trying to extend StructofArrays in a way that it could be used to replace the job of Columns in IndexedTables (and to take advantage of dot overloading, so that columns can be accessed with x.colname).

I was wondering whether Flatten and StructArrays can be joined (so that Flatten takes care of the flattening and StructArrays of creating a columnar storage with fast row iteration).


#4

Great let me know if you have questions or need any changes! a large part of the motivation was integrating with the DiffEq ecosystem and parametrised functions.

@piever m not sure how IndexedTables or StructOfArrays work so canโ€™t really offer any ideas thereโ€ฆ This iteration of Flatten is mostly interesting when you donโ€™t want to flatten all of the fields on the nested struct, or when you also have field metadata to flatten, but Iโ€™m not sure what the scope is really!

On the other hand Iโ€™ve been happily flattening nested models to build InteractBulma interfaces all day :slight_smile:


#5

I have also put some basic low level machinery together for here:


#6

Cool! Seems like this is a common requirement. Handling arrays could be useful too. Is packVec type-stable?


#7

It should be.


#8

Itโ€™s not with units. When things are all Float64 it is though.


#9

Ah: I can add that. Please open an issue.


#10

Because Forward diff etc donโ€™t accept units Iโ€™ve opted for flattening to their internal float values, and reconstructing from float to units again. It makes type stability a lot easier and possible with vectors. Having units in a metafield means you can just flatten the units separately and reapply to the vector if you actually need themโ€ฆ


#11

In case anyone is interested Iโ€™ve generalised the internals of the flattening/reconstruction process in https://github.com/rafaqz/Nested.jl

Which should make it relatively easy to build these kind of high performance nested @generated functions.