BLAS vs CUBLAS benchmark

I wanted to measure actual calculation time (without allocation). Thanks!