I’m writing some numeric code that interfaces with a C library, and there are some type conversions that I have to do in order for the interface to work right. Profiling this code yielded a result that was surprising (to me), which was that converting from a quaternion which is represented internally in my code as a struct of four Float64’s, to an unencapsulated Array{Float32} (which is what the C library needs) takes up an enormous fraction of my computation time.

with the last convert() routine taking upon something like 25% of all my computation. Note that although I have written the Quaternion struct to be type agnostic, in fact in my code I always initialize Quaternions as Quaternion{Float64}'s.

compute imag(q) (which allocates a new Vector{Float64})

concatenate imag(q) and real(q) (which allocates anotherVector{Float64})

convert that result to a Vector{Float32} (which allocates another vector for the result)

You can pretty easily reduce the number of allocations by a factor of 3 by avoiding those two intermediate vector representations by constructing [T(q.v1), T(q.v2), T(q.v3). T(q.s)] for some type T (in this case, T = Float32).

However, before worrying about that, I’d suggest checking out https://github.com/JuliaArrays/StaticArrays.jl as it’s largely designed exactly to improve the performance of constructing lots of small, fixed-size, arrays.