Qs about trained model: (1) testmode! vs excluding Dropout layers; (2) size of BSON file

(1) I trained the NN model and want to use it further, i.e, load from the disk. What is the best way to predict: (i) with Flux.testmode! and Dropout layers; (ii) without Dropout layers and without Flux.testmode!?
(2) My model is based on Transformers.BERT with additional layers. The size of the model BSON file is huge. How does that happen, since the file doesn’t contain only model structure? Is it more suitable to save only model weights?

  1. Flux already does not run dropout or update norm layers like batchnorm during inference. You’d only need to explicitly set testmode! if you’d previously forced trainmode! during inference.

  2. BERT is a pretty big model :slight_smile: . The BSON file does indeed contain the entire model and not just the structure IIRC, but the non-structural components should have negligible overhead compared to the weights. You could try compressing the generated file separately and seeing if that helps.

  1. Flux already does not run dropout or update norm layers like batchnorm during inference. You’d only need to explicitly set testmode! if you’d previously forced trainmode! during inference.

Thanks!

  1. BERT is a pretty big model :slight_smile: . The BSON file does indeed contain the entire model and not just the structure IIRC, but the non-structural components should have negligible overhead compared to the weights. You could try compressing the generated file separately and seeing if that helps.
    Do weights also load with the model? If I save model and then load it and get params, I get an empty array for weights:
    @save "model.bson" model model2 = BSON.load("model.bson", @__MODULE__) @show params(model2)

They should. I would check to see if the actual arrays on the returned model struct match the values you expect instead of pulling them all into an opaque bag with params.