Flux vs pytorch cpu performance

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Benchmarks:

julia> @btime $ww * $xx;
  21.679 μs (2 allocations: 125.08 KiB)

julia> @btime tanh.($xx);
  1.256 ms (2 allocations: 250.08 KiB)

julia> @btime @avx tanh.($xx);
  142.913 μs (2 allocations: 250.08 KiB)

Don’t think it is only the avx stuff that make pytorch fast because of the 10x(0.3ms vs 3ms) difference. I wonder what other optimization pytorch has :thinking: