Just to close this up, the problem was caused by nontrainable parameters in BatchNorm. Flux.params contains only trainable parameters, so the nontrainbale ones were not copied.
Just to close this up, the problem was caused by nontrainable parameters in BatchNorm. Flux.params contains only trainable parameters, so the nontrainbale ones were not copied.