# Efficiency of converting from Array{Float64} to Array{Float32}

I’m writing some numeric code that interfaces with a C library, and there are some type conversions that I have to do in order for the interface to work right. Profiling this code yielded a result that was surprising (to me), which was that converting from a quaternion which is represented internally in my code as a struct of four Float64’s, to an unencapsulated Array{Float32} (which is what the C library needs) takes up an enormous fraction of my computation time.

The code looks like:

``````struct Quaternion{T<:Real}
s::T
v1::T
v2::T
v3::T
end

Quaternion(v::Vector) = Quaternion(v, v, v, v)
Quaternion{T}(a::Array{T,1}) where T<:Real = Quaternion{T}(a, a, a, a)
Quaternion{T}(s::T, imag::Array{T,1}) where T<:Real = Quaternion{T}(s, imag, imag, imag)

Base.imag(q::Quaternion) = [q.v1, q.v2, q.v3]
Base.real(q::Quaternion) = q.s

Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [imag(q); real(q)])
``````

with the last convert() routine taking upon something like 25% of all my computation. Note that although I have written the Quaternion struct to be type agnostic, in fact in my code I always initialize Quaternions as Quaternion{Float64}'s.

What’s the best way to speed this up?

• compute `imag(q)` (which allocates a new `Vector{Float64}`)
• concatenate `imag(q)` and `real(q)` (which allocates another `Vector{Float64}`)
• convert that result to a `Vector{Float32}` (which allocates another vector for the result)

You can pretty easily reduce the number of allocations by a factor of 3 by avoiding those two intermediate vector representations by constructing `[T(q.v1), T(q.v2), T(q.v3). T(q.s)]` for some type `T` (in this case, `T = Float32`).

However, before worrying about that, I’d suggest checking out https://github.com/JuliaArrays/StaticArrays.jl as it’s largely designed exactly to improve the performance of constructing lots of small, fixed-size, arrays.

4 Likes

Excellent, thanks. I did not know about StaticArrays, but it looks perfect for my application so I will check it out.

For my own edification: I changed the routine to

``````Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [q.v1; q.v2; q.v3; q.s])
``````

and it shaved nearly ten seconds off my runtime. Is that similar to what you suggested,

``````Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = [Float32{q.v1}; Float32{q.v2}; Float32{q.v3}; Float32{q.s}]
``````

They seem to time almost identically.

I would write

``````function Base.convert(::Type{Array{T}}, q::Orientation.Quaternion) where T
[T(q.v1); T(q.v2); T(q.v3); T(q.s)]
end
``````

or

``````function Base.convert(::Type{Array{T}}, q::Orientation.Quaternion) where T
T[q.v1; q.v2; q.v3; q.s]
end
``````

Because this only allocates a single array, while

``````Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [q.v1; q.v2; q.v3; q.s])
``````

first allocates `[q.v1; q.v2; q.v3; q.s]` as an Array{Float64}, and then converts that to an array of `Float32`, which also needs memory allocated.

``````julia> function foo(x::Array, ::Type{T}) where T
T[x,x,x]
end
foo (generic function with 1 method)

julia> function foo2(x::Array, ::Type{T}) where T
convert(Array{T}, [x,x,x])
end
foo2 (generic function with 1 method)

julia> x = randn(3)
3-element Array{Float64,1}:
1.323457465707541
0.9684809826996043
0.49214576550482236

julia> foo(x, Float32)
3-element Array{Float32,1}:
1.3234575
0.968481
0.49214578

julia> foo2(x, Float32)
3-element Array{Float32,1}:
1.3234575
0.968481
0.49214578

julia> using BenchmarkTools

julia> @benchmark foo(\$x, Float32)
BenchmarkTools.Trial:
memory estimate:  96 bytes
allocs estimate:  1
--------------
minimum time:     49.355 ns (0.00% GC)
median time:      54.396 ns (0.00% GC)
mean time:        73.066 ns (21.88% GC)
maximum time:     75.538 μs (99.87% GC)
--------------
samples:          10000
evals/sample:     988

julia> @benchmark foo2(\$x, Float32)
BenchmarkTools.Trial:
memory estimate:  256 bytes
allocs estimate:  5
--------------
minimum time:     149.124 ns (0.00% GC)
median time:      159.319 ns (0.00% GC)
mean time:        205.727 ns (19.57% GC)
maximum time:     91.791 μs (99.76% GC)
--------------
samples:          10000
evals/sample:     832
``````
1 Like

You also might want to check out https://github.com/FugroRoames/Rotations.jl which uses StaticArrays internally and is generally quite good.

2 Likes