Creating Ensemble Model(s) with Flux

Hello,

I am trying to build a very basic ensemble model with Flux. Since the network structure does not change, all weak learners are the same. I am facing a problem with loss function definitions. It seems like I have to define a new loss function for each separate model. What do I mean, here is a MWE :

A basic NN model:


using Flux
using Flux.Losses
using Flux: params

x = randn(Float32, 300, 10)
y = randn(Float32, 300, 10) # a regression model will be built !

model = Chain(Dense(300, 300)) 

apply_model(model, x) = model(x) # need a separate apply_model

loss(x, y) = mse(apply_model(x), y) # need a way to change apply_model for ensemble

# then training 
Flux.train!(loss, params(model), ([x, y]), Flux.Adam())

Since I need to build many of these models, I have to define a separate loss function for each which is not practical. Due to that I decided to use macros for writing functions automatically.

For instance, if someone wants to build n models:

n = 5
models = collect( i => Chain(300, 300))  for i in 1:n) # models are built 

# generating apply_model for each ensemble
for i in 1:n
    fname = Symbol("apply_model$(loss_fun)")
    mname = Symbol("models$([loss_fun])")
    @eval ($fname)(mname, inputs) = $apply_model(mname, inputs)
end

# generating loss for each model ? 


I couldn’t manage to create loss functions for each model automatically. This is not an issue with Flux, but more about programming/meta-programming. Could someone show me a proper way of building it?

Buon Domenica :slight_smile:

I think you want something like this:

makeloss(m) = (x, y) -> mse(m(x), y)
Flux.train!(makeloss(model), params(model), [(x, y)], Flux.Adam())

Or you can just make the anonymous function directly, perhaps with a do block. Something like:

models = [Chain(Dense(300, 300)) for _ in 1:10]
opt = Flux.Adam()  # should be ok to share this

for m in models
  Flux.train!(params(m), [(x, y)], opt) do x1, y1
    mse(m(x1), y1)
  end
end
1 Like

Perhaps I’m missing something but why not build an Ensemble structure with arrays of models, losses, and optimizers? Something along those lines is what I use to work with ensembles in Flux:

struct NNEnsemble
    models
    optimizers
    losses
end

n = 5 # Number of models
# Create the ensemble
ensemble = NNEnsemble(
    [Dense(300, 1) for _ ∈ 1:n],
    [ADAM() for _ ∈ 1:n],
    [Flux.Losses.mse for _ ∈ 1:n]
)
# Specify length function for NNEnsemble
Base.length(ensemble::NNEnsemble) = length(ensemble.models)
# Train the ensemble
function train_ensemble!(e::NNEnsemble, x, y)
    for i ∈ 1:length(e)
        ps = Flux.params(e.models[i])
        gs = gradient(ps) do 
            e.losses[i](e.models[i](x), y)
        end
        Flux.update!(e.optimizers[i], ps, gs)
    end
end

# Generate random data
x, y = randn(Float32, 300, 100), randn(Float32, 1, 100)

# Train the ensemble
train_ensemble!(ensemble, x, y)

# Get predictions for ensemble
mean(m(x) for m ∈ ensemble.models)

Edit: of course, if the goal is to have a single loss function and avoid this array, using the following also works:

struct NNEnsemble
    models
    optimizers
end

n = 5 # Number of models
# Create the ensemble
ensemble = NNEnsemble(
    [Dense(300, 1) for _ ∈ 1:n],
    [ADAM() for _ ∈ 1:n]
)

and use Flux.Losses.mse(e.models[i](x), y) in the gradient computation.

1 Like

This looks fine too.

Note that this is a vector of exactly the same function:

I think that points to the weirdness of the present train! interface, or how it’s introduced. Defining loss(x, y) = mse(model(x), y) closes over the model when you define it, and this (together with the dictionary made by params(model)) is how train! knows about the model. It’s all rather weirdly indirect, and global.

(This is all Flux 0.13, for future reference!)

This should work fine, and would allow (say) a different learning rate per model. But the momentum & other state stored in Adam again use global references to arrays in models, objectid(array). Using the same Adam for several models will just append their various arrays.

2 Likes

One thing that isn’t clear to me from your description is whether the models in this ensemble are trained jointly (i.e. loss = aggregate(loss_fn(model1), loss_fn(model2), ...)) or separately as has been assumed above. Could you clarify that? Some pseudocode or a paper/site reference would be very helpful too.

Didn’t know that this is possible. Could you please name this pattern ? Looks really weird at first glimpse :slight_smile: I mean, how the function can see (x, y) tuple without specifying in the function definition ?

This is exact definition of my problem. Thank you.

No, I was planning to train them with distinct/separate loss functions. All models are isolated from each other. From your comment, I guess you’re asking me to join all the loss functions values by calculating the mean (or some other method if exists), and then the error is back propagated towards all models individually. Am I right ?

I was making sure you weren’t trying to do this instead because it’s quite common when training ensemble models. Since you aren’t, you can disregard this.

That’s just an anonymous function closure. (x, y) is not a tuple but a parameter list. If you’re familiar with closures/lambdas in other languages, this is that.

1 Like