I would add that the robustness of basic operators is also problematic.
For example, vanilla RNN on GPU fails and some pooling layers on cpu show incorrect results.
So it feels like a problematic situation when what was meant like a core tool such Flux doesn’t support very basic building blocks for NN like RNN and pooling layers reliably both cpu and gpu.
As for performance, my understanding of the approach is to optionnally support the dispatch on specialized backends, for example CuDNN or Torch. I think that such approach where speed can be added incrementally by expanding the bindings is sound in its current form.
Personnally, I’d be ok with some slowdown compared to Pytorch for example (well, not by 10X!) given the appealing Flux framework, knowing that speed could be progressively catch up.
However, if I can’t trust the quality of what is returned by an operator, then that’s a killer for starting any serious work with it. I think that reliability should be given a higher priority, making it work first, then fast. So I’m still sticking to mxnet for now.
Edward Yang himself (pytorch coredev) seems to suggest the future of pytorch looks a lot like Julia here
IMO It’s not Julia that would have copycat pytorch, but the converse, pytorch that mimics more and more Julia core itself when it grows up. Finally Pytorch may be understaffed in those land. Julia, now very manpowered, may bring some wins.
Comparison with BLAS (opinion there) may have some biases since one is before the kiss of death era, and hurray, has been successfully funded, coded and deployed. Meanwhile the other is an offspring in the multidispatch all the things time combined with web and github
BTW Edward Yang is the first contributor since 2017, this article is his top link on the forum