That paper (and the rest of the information bottleneck work) has generated a fair amount of controversy (see this ICLR paper and the open reviews). For example, the “compression phase” where the mutual information decreases with more training doesn’t happen when you use ReLU (instead of tanh).
Interesting. One comment I would have is that despite the extensive discussion about it in the original paper, I would not have expected the fitting and compression to have distinct and identifiable phases in the general case, but while that’s an interesting detail, it hardly seems like a crucial one. Second, I’d certainly expect that there would be cases where it’s possible to get good representations of the target variable without compression but (perhaps?) the interesting cases are those in which it’s too difficult to achieve this as a practical matter. That said, the examples where they show that they get over-fitting in spite of compression seem extremely worrying for the whole information bottleneck picture, so that’s definitely very interesting.
What is faster Knet, Flux or MXNet?
Which one is easier?
Which one has more limitations?
Which one can deal with larger datasets?
KNet, at least for now.
KNet and MXNet
All utilize standard Julia tools for the data handling, so tie unless this is a speed question.
Flux generates code from Julia functions, the others have hardcoded CUDA kernels in the package. Kernels in the package are easier to modify and get optimized for a single purpose, giving at least the current state of (1), but means you can’t just build any GPU kernel you want (3). Flux’s design is a major plus because it flows like a standard small Julia package whereas the other two flow like a standard ML framework which always felt to me “language in language” (2).
For CPU only, how do they rank speedwise?
For CPU only and if you intend to use convolution then Knet currently is very slow. MXnet is quite ok and Flux I haven’t tried yet but I expect it to have MXnet speed.