The most recent benchmarks I’ve found have shown Flux a bit faster:
though I would expect that not to be the case when deploying to large multi-GPU sessions, which is where TensorFlow shines. Though I do also know of
http://denizyuret.github.io/Knet.jl/latest/tutorial/#Benchmarks-1
By all accounts, KNet is one of the fastest out there (and notice the Knet benchmarks were not written or ran by the Knet authors. They weren’t even written/ran by regular Julia users!), and Flux lags a bit behind in some areas, but uses KNet as a goal post. Thus KNet can be used as a bridge to see how Flux compares to KNet which compares to the rest of the world. KNet sacrifices some generality to get there by relying on hardcoded CUDA kernels for many things, so Flux is trying to get “the hardcoded performance” but without relying on hardcoded kernels.