Reinterpret vector into single struct

Hi,

I’m optimizing a bit of code where I need to reinterpret a vector into a struct. For example, I have the MWE example below

struct Test
    a1::Float64
    a2::Float64
end

vec = [2.0, 1.0]
using BenchmarkTools

@benchmark t = reinterpret(Test, $vec)[1]

For this simple example, I’m getting a 5.6 ns benchmark. A significant chunk of this time seems to come from accesing index 1 of the reinterpreted object, as the following

@benchmark t = reinterpret(Test, $vec)

takes only 1.9 ns. Is there a way to get directly the single object of type Test that is produced by reinterpret without paying that extra cost?

PS: If curious about why I’m doing this, I’m doing something similar to what DiffCache form PreallocationTool does, where it preallocates memory to produce objects of type ForwardDiff.Dual. For my use case I can get better performance by using a simplified version of DiffCache, but the point above remains a performance bottleneck.

I would recommend trying to do something like Test(vec[1], vec[2]) or whatever, since julia doesn’t really like reinterpretation, and reinterpret is slow for a reason.

Here’s something you could do if you really need performance though:

# This is marked @noinline so that the string interpolation doesn't cause problems with the optimizer. 
# Branch prediction makes it so this function has no real runtime cost.
@noinline function _do_checks(::Type{T}, v::Vector{U}) where {T, U}
    isbitstype(T) || 
        error("Type must be inline allocated, got $T")
    # I'm using isprimitive type here since primitive types aren't padded, and observing padding is illegal
    # If you're comfortable using internals, you could use !(Base.datatype_haspadding(U)) instead
    isprimitivetype(U) ||
        error("Vector can only contain primitive types, got $U")
    sizeof(v) >= sizeof(T) ||
        error("Insufficient space. Vector has $(sizeof(v)) bytes and $T needs $(sizeof(T)) bytes")
end

function get_a(::Type{T}, v::Vector{U}) where {T, U}
    _do_checks(T, v)
    GC.@preserve v begin       # Make sure the vector stays alive!
        p::Ptr{T} = pointer(v) # convert from Ptr{U} to Ptr{T}
        unsafe_load(p)         # Load the first element from that pointer
    end
end
struct Test
    a::Float64
    b::Float64
end

let v = [1.0, 2.0, 3.0]
    @benchmark get_a(Test, $v)
end
#+RESULTS:
: BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
:  Range (min … max):  3.246 ns … 18.295 ns  ┊ GC (min … max): 0.00% … 0.00%
:  Time  (median):     3.897 ns              ┊ GC (median):    0.00%
:  Time  (mean ± σ):   3.821 ns ±  0.477 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
: 
:   ▆▂              █▄                                         ▁
:   ██▄▁▁▁▃▁▁▃▁▁▁▃▃▃██▄▁▁▁▃▁▄▄▁▃▁▅▇▇▆▅▅▇█▆▇▆▆▆▆▅▆▆▅▇▆▆▅▅▆▅▆▅▆▅ █
:   3.25 ns      Histogram: log(frequency) by time     5.54 ns <
: 
:  Memory estimate: 0 bytes, allocs estimate: 0.

You really need to be careful doing things like this though as it’s easy to cause serious correctness problems if you screw it up.

4 Likes

Thanks @Mason!

That does the job. In my case I can also skip the _do_checks function as I can explicitly define the function only for a combination of input types that satisfies those criteria.

If Test is immutable, what’s the advantage of forcing it to occupy the same memory than the original array? Won’t the storage of the instance be freely handled by the compiler anyway?

It’s not about forcing, it’s just that some function gives you a Vector{U} or whatever and you want to just eagerly grab load the first N bytes of that as some other type T.

1 Like

I still don’t get it. I could see the win for a mutable struct. But if it’s isbits, you’re not gaining anything over the naive version, are you? On my machine,

function get_a2(::Type{T}, v::Vector{U}) where {T, U}
     T(v[1], v[2])
end

is twice as fast as get_a (2.5ns vs 5ns).

1 Like

In this specific case you can do that, but what if you have v::Vector{UInt8} or something? The general problem is that you sometimes are given some stream of data and you want to interpret it as another type.

i.e.

julia> let v = rand(UInt8, 100)
           get_a(Test, v)
       end
Test(1.6347159102878037e-124, 2.5913667139837022e-58)
2 Likes

Oh, OK. I thought that reinterpret over the original UInt8 vector would do the trick, but

julia> let v = rand(UInt8, 16)
                  @benchmark get_a2(Test, reinterpret(Float64, $v))
              end
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  10.594 ns … 41.792 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.678 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.730 ns ±  0.826 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                         █                                     
  ▂▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▂ ▂
  10.6 ns         Histogram: frequency by time        10.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

your version indeed is 2X faster in this case.

Depending on your application, also consider

 rc =  ccall(:jl_reshape_array, Array{Test,1}, (Any, Any, Any), Array{Test,1}, vec, (1,))

This is the old implementation of reinterpret. Various caveats apply.

There is always a reason, but it’s not a good reason :frowning:

I’d say skip the programmatic checks and do unsafe_load with pointer-casting, as @Mason said.

Remember that this cannot be done fast and safe and generic – choose two. You don’t care about “generic”, so all is fine.

As a word of warning: If you reinterpret with custom structs, then you must know the C memory layout rules and structure padding. Structure padding is scary.

Julia Bool is especially scary in context of structure padding, since the language assumes that the leading 7 bit are zero. If they are not zero, then you are in undefined-behavior bats-in-your-nose land. You cannot check whether they are zero after reinterpretation, because the compiler knows that these 7 bit are zero and can optimize away your check. Such a malformed Bool can do things like

julia> struct X
       b::Bool
       end

julia> function flub(bs) 
       c = @inbounds bs[1].b 
       if c===true
       1
       elseif c===false
       0
       else
       3
       end
       end
flub (generic function with 1 method)

julia> flub(reinterpret(X,[0x46]))
3

julia> reinterpret(X,[0x46])[1].b==false
true

julia> function flub2(bs) 
       c = if rand()==1337 Any[1][1] else @inbounds bs[1].b end
       if c===true
       1
       elseif c===false
       0
       else
       3
       end
       end
flub2 (generic function with 1 method)

julia> flub2(reinterpret(X,[0x46]))
0

edit: Messed up code example during copy-paste. It is supposed to demonstrate that:

  1. A malformed Bool can be neither true nor false
  2. However, it can become normalized to true/false when stored into memory, e.g. during Boxing due to type instability
  3. This means that behavior of malformed Bool is not well-defined – it depends on the state of type inference and optimizer/compiler internal state, and it is not correctly reproduced in the top-level REPL or in the interpreter.

This is not a bug in julia, its a bug in the user code that misunderstands the implicit structure padding of Bool and reinterpret.

3 Likes

Thanks a bunch for all this discussion and specially for pointing out all these caveats and danger zones, this is very enlightening :slight_smile:.