I created an article that attempts to explain the Flux example of a Convolutional Neural Network operating on the MNIST set. The article goes through each line and explains what its doing in Julia. I’d appreciate any input on making it clearer for developers wanting to get into deep learning.
Nice write-up! There is only one part that I find a little confusing (I’m a deep learning noob, so it may be something obvious). The performance after training the network in what seems like the standard way, that is to say Flux.train!(loss, train, opt, cb = evalcb)
is OK but not spectacular (56% compared to a chance level of 10%). Yet you mention that:
If we run the data through 10 times, we start to approach accuracies of 96%
What does that mean exactly? You are calling the same function Flux.train!(loss, train, opt, cb = evalcb)
, you are passing different training datasets to the network, or something else? I find it confusing because if somehow the performance is suboptimal after the initial training because the training was not enough, I would expect some keyword in the train!
function to specify when to stop training.
Thanks Piever,
I appreciate you taking the time to give insightful feedback. It approaches 96% accuracy after running the same dataset (60,000 images) through 10x. I suspect you would see a similar convergence if you ran 10 different MNIST datasets through. It just takes a lot of data for the neural network to perform enough error correction in the weights to start to improve the ability of the model to classify the images.
The train!
function just loops once through the data. Running the same train!
function with the same data several times will improve the result.
That is correct, in the notebook, I ran the Flux.train! function 10 times on the same MNIST data to get to 96% accuracy. Thanks for clarifying, dpsanders. I’ll try to make that clearer in the article.