`Conv` is 2x slow than pytorch `Conv` on cpu

Hmm, I just saw that the benchmark does not even use the GPU, so there is no CUDNN involved. Sorry for the noise.

Could it be an MKL vs OpenBLAS thing then?