I would add that the robustness of basic operators is also problematic.
For example, vanilla RNN on GPU fails and some pooling layers on cpu show incorrect results.
So it feels like a problematic situation when what was meant like a core tool such Flux doesn’t support very basic building blocks for NN like RNN and pooling layers reliably both cpu and gpu.
As for performance, my understanding of the approach is to optionnally support the dispatch on specialized backends, for example CuDNN or Torch. I think that such approach where speed can be added incrementally by expanding the bindings is sound in its current form.
Personnally, I’d be ok with some slowdown compared to Pytorch for example (well, not by 10X!) given the appealing Flux framework, knowing that speed could be progressively catch up.
However, if I can’t trust the quality of what is returned by an operator, then that’s a killer for starting any serious work with it. I think that reliability should be given a higher priority, making it work first, then fast. So I’m still sticking to mxnet for now.