I have two programs that are almost identical except that in one program I used Float32 vectors or arrays while in the other one I used Float64. The Float64 program (named prog_f64.jl) is about 5x faster than the Float32 one (named proj_f32.jl). Does anybody know why such problem happens? You can see my programs here: https://gist.github.com/jinliangwei/74d3871bebc1251b651cc6fd511fed08
Also, sqrt should be faster than ^0.5, and is generic with respect to Float types.
That being said, constants are always a pain for me when writing generic code that is supposed to work for Int16 / Int32 / Int64 or Float32 / Float64. I tend to get the needed type somewhere, e.g. using typeof, eltype or where or an explicit float_t parameter, and then have convert all over the place. This looks somewhat ugly and I am not sure how well this plays with autodiff, but I know no better way. Anyone else has better tips?
It would be possible to have a @literal_type T begin ... end macro that would rewrite an expression, converting all numeric literals to type, T. That would allow you to write your code with natural literal syntax and have them converted to the type of some type parameter.
Converting only literals from x to T(x) is much easier than what ChangePrecision.jl currently does, so it should probably be a separate macro (but could be in the same package; PRs welcome).