Training gets differents results when using Flux.train() inside function

Probably has to do with your loss(x, y) definition. I suspect the m in the function body is not referencing the m created inside all_the_code and instead referencing some other m in global scope. In any case, it is better to explicitly pass in the model to the loss function, because depending on the scoping of m, you maybe using a global variable which can cause performance issues (and bugs like this one!). Instead you should define

loss(x, y, m) = Flux.mse(m(x), y)

then when you call train!, you can use a closure over m:

Flux.train!((x, y) -> loss(x, y, m), ps, datatrain, opt, cb = throttle(evalcb, time_show))

This will close over the m in the same scope as where Flux.train! was called, so unless you do something really weird, it should be referencing the m you expect.