Just my $0.02 based on personal experience with Knet and Flux over the past 4 months.
A bit of background: I started learning Julia ~5 months ago. No previous expertise with deep-learning / NNs, but a lot of experience with ML in general based on R. Probably about 6 years of intensive programming experience with R (developing packages, etc). About ~3-4 months ago I started experimenting with deep learning using Julia. The choice was to either use Python or Julia (R until recently had little to offer). I chose Julia and have been going back and forth between Knet and Flux, probably 50/50. Both have their own advantages, but I think neither is production ready. With much regret I must say that I wish I had chosen Python and am now in the process of slowly transitioning to PyTorch.
Flux: While I absolutely love the beauty and simplicity of Flux, it is not a production ready library. Hopefully it can get there one day, but I think its years away. The main problem is optimization over GPUs: with Flux it is virtually non-existent.The moment you try to use custom losses (i.e., those not included in standard examples), you are going to run into major issues. Since most of actual deep learning requires GPUs, well, you are in trouble.
Just for fun, try using Flux on GPU with a custom loss that takes something to a power > 2.0, i.e., x^3.0 or x^5.0. Good luck with that. How about a loss that requires you to simulate from some non-uniform distribution, i.e., normal? Good luck with that as well. How about using Float32 instead of Float64 (gives a big boost in performance on GPUs)? Nope, no can do. Even though these are mostly CuArrays’ issues (except for last one), it doesn’t really matter – the end story is that CuArrays is currently very lacking in terms of GPU support. Designing and optimizing GPU kernels, even with Julia magic, is a very labor intensive process (although I am no expert by any means), that’s why I say Flux is years away from being production ready. Having said that, I had a ton of fun hacking away at Flux (and continue to). Mike’s implementation is a thing of beauty, some of his Julia code is just on another level. For example, take a look at the Flux implementation of backprop with Adam optimizer – there is so much Julia magic there it hurts. Sometimes the source code can be very hard to understand (for a newbie), but its a very rewarding experience when you eventually get it.
Knet: Great library if you are trying to learn about NNs, trying to understand the design and implementation (as well as learn some Julia). Everything is low-level, down to matrices. For example, Knet’s low-level implementation of LSTMs has helped me a great deal in understanding these models. Staring and comparing Flux vs. Knet implementation of LSTMs is also super helpful. Knet is well optimized for GPUs, has a ton of examples (maybe too many) and is extensively benchmarked against Tensorflow. The problem with Knet is that it always feels like a bandaid and the code base is an absolute mess. For example, Knet uses its own implementation of the auto diff, which is just a port of the Python package. Then why not just use Python directly? Knet sort of defies the whole point of using Julia. Doing anything more advanced and custom with GPUs (beyond the basic examples) is going to get you in trouble as well. BTW, here Knet relies on its own implementation of GPU arrays, which provides a bit more support than CuArrays, but its still no enough. Try using higher-dim tensors and subsetting / slicing beyond the first index. Good luck with that. There are so many versions of Knet that it often leads to a ton of confusion. Sometimes you find support / function that is supposed to work only to discover that its no longer supported. Basically I feel like Knet is in need of a massive rewrite.
So my final suggestion would be to stick with Python, but hack around with Julia anyways, as it could be a nice learning experience.