`reinterpret` to a single value from an array of a smaller data type

ExpandingMan · March 23, 2018, 11:56pm

Currently on 0.7 if I want to reinterpret from a smaller data type to a larger one, I have to return an array, e.g.

reinterpret(Float64, zeros(UInt8, 8))
# returns Float64[0.0]

This results in an unnecessary allocation and interpret being unbelievably expensive for retrieving individual elements (by a factor of 3, according to my experiments). Is there any “safe” way around this, i.e. a safe equivalent to unsafe_load, even in principle? This seems really important if reinterpret has any chance of being a viable alternative to unsafe_ methods.

rdeits · March 24, 2018, 1:31am

Can you pre-allocate the reinterpreted array and then just write into its source data? For example:

julia> x = zeros(UInt8, 8)
8-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

julia> y = reinterpret(Float64, x)
1-element reinterpret(Float64, ::Array{UInt8,1}):
 0.0

julia> x[1] = 1
1

julia> y
1-element reinterpret(Float64, ::Array{UInt8,1}):
 5.0e-324

Now you can copy into x as many times as you want and then copy out the result by taking y[1]. Still potentially expensive, but at least you don’t have to keep allocating new arrays.

ExpandingMan · March 24, 2018, 1:46am

That doesn’t really work, it would require the larger type array (in this case Float64) to be persistent, which seems impractical.

Basically the use case is that I have a big Vector{UInt8} and I want to be able to pull values from it without allocating.

As I was writing this I suddenly realized that the intended use of reinterpret is probably to have a ReinterpretArray just sitting around. This is so different from how I was doing things with unsafe_wrap that now I’m kind of terrified…

y4lu · March 24, 2018, 2:32am

Did you see

>?reinterpret

Warning: It is not allowed to reinterpret to an array to an element type with a larger alignment than the alignment of the array. For example, reinterpret(UInt32, UInt8[0, 0, 0, 0]) is not allowed

ExpandingMan · March 24, 2018, 2:33am

That’s on 0.6, we are on 0.7.

y4lu · March 24, 2018, 4:59am

There was also a thread about converting bits to ints, where a way to wrap a type for bit access came up that may be handy. There doesn’t appear to be binary & methods defined for float types in 0.6 though

kristoffer.carlsson · March 24, 2018, 6:40am

Use bit shifts. Shift each Uint8 with a different amount and or them together.

tk3369 · March 24, 2018, 7:20am

@ExpandingMan Take a look at SASLib.jl’s conversion function
It should work in both Julia v0.6 and v0.7.

y4lu · March 24, 2018, 8:53am

A bit off-topic, but are floating-point types in julia kind of opaque?, as reinterpret falls back to a ccall, and other binary methods like &, |, << haven’t methods for floats.

Tamas_Papp · March 24, 2018, 9:06am

What do you suggest eg & do with a Float64?

y4lu · March 24, 2018, 9:19am

Float64 is still a bitstype, so it could be that 0.0 & 0x01 would give a first (or last) bit value, not minding the length difference for that example. It’s not too unreasonable that they aren’t defined though

It might be easier to define a native julia floating or fixed point value

This works too

x = IOBuffer();
write(x, Float64(1));
seek(x,0);
z = read(x)

gdkrmr · March 24, 2018, 9:53am

If you have a tuple of UInt8’s, the following gets completely optimized out for all Unsigned types:

I haven’t found a no op way to create byte tuples from UIntX.

Tamas_Papp · March 24, 2018, 12:05pm

AFAICT the “standard” technique for that is with reinterpret, see eg

ExpandingMan · March 24, 2018, 1:39pm

Thanks for all the responses.

I think it would be really nice if we had this ability in Base with some variant of reinterpret. I’ve been working on an IO project that involves retrieving data from and writing data to IO buffers and I was hopeful that reinterpret would be able to replace all of the unsafe methods at some point in the near future, but I think we are still very far from that point. It’s definitely going to be really difficult to get it there, because for applications like this it’s going to be absolutely performance critical, and getting reinterpret performance to be on par with that of direct reference wrapping is going to be challenging.

y4lu · March 24, 2018, 2:14pm

The very basics are mostly working, if it can imitate a Float64 then it may give fairly fast/easy bit access

module xf8
  type fixedp8
    var1::UInt8;
    var2::UInt8;
    end;
  v1(x::Number) = floor(UInt8, x%256);
  v2(x::Number) = v1(256*(x-v1(x)));
  valFl(x::fixedp8) = Float64(x.var1) + x.var2/256;
  valUi(x::fixedp8) = UInt16(x.var1)*256 + x.var2;
  fixedp8(x::Number) = fixedp8(v1(x), v2(x));
  fixedp8(x::UInt16) = fixedp8(UInt8(floor(x/256)), x%256);
  Base.:+(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)+valUi(y));
  Base.:-(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)-valUi(y));
  Base.:*(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)*valUi(y));
  Base.display(x::fixedp8) = print(valFl(x));
  end;

gdkrmr · March 24, 2018, 2:25pm

I would love to have a version or reinterpret that takes a bitstype and returns a NTuple{N, UInt8} without overhead.

ScottPJones · March 24, 2018, 5:20pm

That’s rather slow though.
To get speed, you’d need to get a pointer, which you can reinterpret to a different pointer type
(essentially a no-op) (although beware of alignment issues on the 32-bit ARM platform).

ScottPJones · March 24, 2018, 5:25pm

You might want to take a look at what I’m doing in the Strs package.
To have fast access to codeunits of various sizes (1, 2, 4 bytes), but keeping them stored in a buffer allocated as a String with Base._string_n, I have two functions get_codeunit and set_codeunit!.
These are used to wrap up the unsafe_load and unsafe_store! operations.
I also use the GC.@preserve macro to be safe, and make sure all bounds checking is done in advance.

ExpandingMan · March 24, 2018, 6:10pm

Yeah, it does seem a bit crazy. Like I said above, I think we have a long way to go before we can do IO stuff without the use of Ptr. I believe we can get there, but I don’t think it’s going to be a 1.0 thing as I had initially hoped.

jameson · March 26, 2018, 4:29am

There’s many methods already provided for safely reading/writing primitive bit elements from IO. For example, read(io, Float64) or read!(io, zeros(Float64, N)). It’s also fairly easy to define a new one: read(io, ::Type{T}) where {T <: S} = read!(io, Ref{T}())[] and write(io, x::S) = write(io, Ref(x))

Have you seen FixedPointNumbers?