Flux vs pytorch cpu performance

A PR for @avx inclusion is open https://github.com/FluxML/NNlib.jl/pull/199