Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP)

There was a similar question posted here a little while ago, and in that situation, it seemed to be the case that keras was using a batch size of 32 by default and Flux wasn’t, and that was where the difference in behavior was coming from. I wonder if you’re seeing the same thing.

Here’s a link to that thread- the post just below this one has a version of the original author’s code that matches the keras/tf behavior.

Hope that helps! :slight_smile:

1 Like