When testing some things in the REPL I tried creating a 100x100 matrix of zeros and found it strange that the built-in zeros was slower than creating such a matrix using a Array comprehension.
That said, I get more timing variations for the zeros than for the comprehension. Neither is consistently faster than the other. Could be memory address related.
In particular, you only need to use $ interpolation on non-constant variables (or expressions that you want to evaluate before benchmarking). Float64 is a constant.