Efficiency of converting from Array{Float64} to Array{Float32}


#1

I’m writing some numeric code that interfaces with a C library, and there are some type conversions that I have to do in order for the interface to work right. Profiling this code yielded a result that was surprising (to me), which was that converting from a quaternion which is represented internally in my code as a struct of four Float64’s, to an unencapsulated Array{Float32} (which is what the C library needs) takes up an enormous fraction of my computation time.

The code looks like:

struct Quaternion{T<:Real}
    s::T
    v1::T
    v2::T
    v3::T
end

Quaternion(v::Vector) = Quaternion(v[1], v[2], v[3], v[4])
Quaternion{T}(a::Array{T,1}) where T<:Real = Quaternion{T}(a[1], a[2], a[3], a[4])
Quaternion{T}(s::T, imag::Array{T,1}) where T<:Real = Quaternion{T}(s, imag[1], imag[2], imag[3])

Base.imag(q::Quaternion) = [q.v1, q.v2, q.v3]
Base.real(q::Quaternion) = q.s

Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [imag(q); real(q)])

with the last convert() routine taking upon something like 25% of all my computation. Note that although I have written the Quaternion struct to be type agnostic, in fact in my code I always initialize Quaternions as Quaternion{Float64}'s.

What’s the best way to speed this up?


#2

Your conversion routine currently does:

  • compute imag(q) (which allocates a new Vector{Float64})
  • concatenate imag(q) and real(q) (which allocates another Vector{Float64})
  • convert that result to a Vector{Float32} (which allocates another vector for the result)

You can pretty easily reduce the number of allocations by a factor of 3 by avoiding those two intermediate vector representations by constructing [T(q.v1), T(q.v2), T(q.v3). T(q.s)] for some type T (in this case, T = Float32).

However, before worrying about that, I’d suggest checking out https://github.com/JuliaArrays/StaticArrays.jl as it’s largely designed exactly to improve the performance of constructing lots of small, fixed-size, arrays.


#3

Excellent, thanks. I did not know about StaticArrays, but it looks perfect for my application so I will check it out.

For my own edification: I changed the routine to

Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [q.v1; q.v2; q.v3; q.s])

and it shaved nearly ten seconds off my runtime. Is that similar to what you suggested,

Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = [Float32{q.v1}; Float32{q.v2}; Float32{q.v3}; Float32{q.s}]

They seem to time almost identically.


#4

I would write

function Base.convert(::Type{Array{T}}, q::Orientation.Quaternion) where T
    [T(q.v1); T(q.v2); T(q.v3); T(q.s)]
end

or

function Base.convert(::Type{Array{T}}, q::Orientation.Quaternion) where T
   T[q.v1; q.v2; q.v3; q.s]
end

Because this only allocates a single array, while

Base.convert(::Type{Array{Float32}}, q::Orientation.Quaternion{Float64}) = convert(Array{Float32}, [q.v1; q.v2; q.v3; q.s])

first allocates [q.v1; q.v2; q.v3; q.s] as an Array{Float64}, and then converts that to an array of Float32, which also needs memory allocated.

julia> function foo(x::Array, ::Type{T}) where T
           T[x[1],x[2],x[3]]
       end
foo (generic function with 1 method)

julia> function foo2(x::Array, ::Type{T}) where T
           convert(Array{T}, [x[1],x[2],x[3]])
       end
foo2 (generic function with 1 method)

julia> x = randn(3)
3-element Array{Float64,1}:
 1.323457465707541  
 0.9684809826996043 
 0.49214576550482236

julia> foo(x, Float32)
3-element Array{Float32,1}:
 1.3234575 
 0.968481  
 0.49214578

julia> foo2(x, Float32)
3-element Array{Float32,1}:
 1.3234575 
 0.968481  
 0.49214578

julia> using BenchmarkTools

julia> @benchmark foo($x, Float32)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     49.355 ns (0.00% GC)
  median time:      54.396 ns (0.00% GC)
  mean time:        73.066 ns (21.88% GC)
  maximum time:     75.538 μs (99.87% GC)
  --------------
  samples:          10000
  evals/sample:     988

julia> @benchmark foo2($x, Float32)
BenchmarkTools.Trial: 
  memory estimate:  256 bytes
  allocs estimate:  5
  --------------
  minimum time:     149.124 ns (0.00% GC)
  median time:      159.319 ns (0.00% GC)
  mean time:        205.727 ns (19.57% GC)
  maximum time:     91.791 μs (99.76% GC)
  --------------
  samples:          10000
  evals/sample:     832

#5

You also might want to check out https://github.com/FugroRoames/Rotations.jl which uses StaticArrays internally and is generally quite good.