Thanks for your quick reply. Then following your suggestion, I should overload every operation, e.g. dot, norm
which BLAS has its own ways to make it faster, right? That is still a lot of hacky work and I really wonder if there is any easier way to deal with these things for AbstractArrays in Julia.