Is this bitcast function sane and safe?

jw3126 · October 18, 2019, 8:41am

I would like to take the bits of one struct and reinterpret them as the bits of another type. Semantically the following does the desired job:

function check_bitcast(T, s)
    S = typeof(s)
    isbitstype(T) || throw(ArgumentError("Can only cast into bitstype."))
    isbitstype(S) || throw(ArgumentError("Can only cast from bitstype."))
    sizeof(T) == sizeof(S) || throw(ArgumentError("Can only cast between types of equal size."))
end

function bitcast_slow(T, s)
    check_bitcast(T, s)
    arr = reinterpret(T, [s])
    first(arr)
end

However it is too slow (see below). The following seems to achive the same, but faster:

function unsafe_bitcast(::Type{T}, s::S) where {T, S}
    rt = Ref{T}()
    rs = Ref{S}(s)
    GC.@preserve rt rs begin
        pt = Ptr{UInt8}(Base.unsafe_convert(Ref{T}, rt))
        ps = Ptr{UInt8}(Base.unsafe_convert(Ref{S}, rs))
        Base._memcpy!(pt, ps, sizeof(T))
    end
    return rt[] 
end

function bitcast(::Type{T}, s::S) where {T, S}
    check_bitcast(T, s)
    unsafe_bitcast(T, s)
end

struct F64; value::Float64; end
struct U8; value::UInt8; end

using BenchmarkTools
t = ntuple(U8, 8)
@assert bitcast(F64, t) === bitcast_slow(F64, t)

@btime bitcast($F64, $t) #   1.297 ns (0 allocations: 0 bytes)
@btime bitcast_slow($F64, $t)  #   32.021 ns (2 allocations: 128 bytes)

Is the fast implementation sane + correct + safe?

stevengj · October 18, 2019, 1:31pm

Why not just do

x = 3.14159
b = reinterpret(UInt64, x)
b % UInt8 # first (least significant) byte

rather than messing around with pointers?

What are you trying to accomplish here by a bitcast? If you just want to write/read raw bytes to/from a stream, you can use write and read, for example.

jw3126 · October 18, 2019, 3:05pm

Because this works only for a few builtin types. My main interest is converting NTuple{N,UInt8} into a custom struct.

struct F64; value::Float64;end
reinterpret(F64, 1)
# throws bitcast: target type not a leaf primitive type

Really I want to mmap a file, that contains nested structs in a “packed” memory layout. I would like to mirror the packed memory layout by using a julia struct that contains a private tuple of bytes UInt8 and does lots of getproperites overloading etc.

stevengj · October 19, 2019, 1:37am

Why not just read directly into corresponding struct types? Why use an NTuple{N,UInt8} at all?

jw3126 · October 19, 2019, 7:05am

Yeah I have implemented that a long time ago and it is what I currently use. However often I don’t care about the full struct. I just want to compute statistics over one or two fields. In this case it is a waste to “unpack” the full struct. When using the mmap approach I only have to pay for the fields that I am actually using. With a slightly different format I got 2x speedup by doing this.

stevengj · October 20, 2019, 1:49pm

The same is possible for ordinary file I/O: look up seek and skip

jw3126 · October 20, 2019, 6:08pm

Sure it is possible and I thought about doing this. But I think reading whole structs would give a much nicer high level API. Here is what I have in mind (the structs are particles and the file format is IAEA/EGS phase space format):

using PhaseSpaceIO, Transducers, OnlineStats

particles = load("huge.IAEAphsp")
xf = Filter(iselectron) |> Map(energy) |> Take(10^7)
estimate!(Histogram(), xf, particles)

stevengj · October 21, 2019, 2:15am

You can provide whatever API you want on top of a seekable file io stream just as you could with an mmap array; in either case you would be wrapping it in some object with accessor functions. In your case, it seems like a stream object may be more convenient since you need to read heterogeneous types?

jw3126 · October 21, 2019, 7:33am

So for each file there is a header that encodes the struct layout. All particles in a single file then have this layout. But different files may have different layout. So the above example has the following features:

I can throw functions at it that accept a Particle struct and don’t need to write special functions that accept ParticleStream struct.
Under the hood for most particles only the field that decides whether the particle is an electron is decoded.
Under the hood for the electrons the energy field is the only further decoded field.

How would I do that with the seekable io?

jw3126 · October 21, 2019, 7:45am

Also thanks for all the feedback @stevengj. So I get, that you think the bitcast solution is bad. Can you maybe comment why it is bad? In particular is a bitcast function by itself bad, or is it my implementation?

Topic		Replies	Views
Reinterpret vector into single struct Performance	9	610	December 18, 2023
Reinterpret on primitive type General Usage	0	762	November 20, 2017
Reinterpret Int64 as 2xInt32 struct General Usage	9	490	August 18, 2022
`reinterpret` to a single value from an array of a smaller data type General Usage	24	3182	March 26, 2018
Reinterpret returns wrong values New to Julia	23	723	May 2, 2024

Is this bitcast function sane and safe?

Related topics