Creating Ensemble Model(s) with Flux

One thing that isn’t clear to me from your description is whether the models in this ensemble are trained jointly (i.e. loss = aggregate(loss_fn(model1), loss_fn(model2), ...)) or separately as has been assumed above. Could you clarify that? Some pseudocode or a paper/site reference would be very helpful too.