I started using Flux recently, and I am wondering when Flux.trainmode!() and Flux.testmode!() need to be called.
Am I correct that these functions do not need to be called explicitly during normal training and testing, no matter how complex the model is? I find that a simple example shown in this Flux documentation does not use these functions at all.
Also, if Flux.trainmode!() and Flux.testmode!() do not need to be called explicitly during normal Flux usage, then when do these functions need to be called?
Have you seen Built-in Layers · Flux? The idea is that sometimes you want to override the automatic behaviour. For example, some training regimes may freeze batchnorm stat updates temporarily.
Thanks, and sorry that I haven’t followed up for a while.
Then is it correct to say that it doesn’t hurt to use trainmode!() during training and testmode!() during testing explicitly?
More generally, how can I write code that uses the automatic behavior and code that doesn’t? Could you show two sets of code that achieve the same result, one using the automatic behavior and the other that doesn’t?
It doesn’t hurt, but you’d have to remember to switch back afterwards if you want to replicate the automatic behaviour. Remember that if you explicitly disable the automatic behaviour by using true/false, you can always re-enable it by passing mode = :auto or mode = nothing.
There’s no real difference. Just call trainmode!(model) before you want to enable BatchNorm stats updating/Dropout dropout, and call testmode!(model) when you want to stop. Most likely these would be before and after the gradient call in your training loop, respectively.