`reinterpret` to a single value from an array of a smaller data type


#1

Currently on 0.7 if I want to reinterpret from a smaller data type to a larger one, I have to return an array, e.g.

reinterpret(Float64, zeros(UInt8, 8))
# returns Float64[0.0]

This results in an unnecessary allocation and interpret being unbelievably expensive for retrieving individual elements (by a factor of 3, according to my experiments). Is there any “safe” way around this, i.e. a safe equivalent to unsafe_load, even in principle? This seems really important if reinterpret has any chance of being a viable alternative to unsafe_ methods.


#2

Can you pre-allocate the reinterpreted array and then just write into its source data? For example:

julia> x = zeros(UInt8, 8)
8-element Array{UInt8,1}:
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00

julia> y = reinterpret(Float64, x)
1-element reinterpret(Float64, ::Array{UInt8,1}):
 0.0

julia> x[1] = 1
1

julia> y
1-element reinterpret(Float64, ::Array{UInt8,1}):
 5.0e-324

Now you can copy into x as many times as you want and then copy out the result by taking y[1]. Still potentially expensive, but at least you don’t have to keep allocating new arrays.


#3

That doesn’t really work, it would require the larger type array (in this case Float64) to be persistent, which seems impractical.

Basically the use case is that I have a big Vector{UInt8} and I want to be able to pull values from it without allocating.

As I was writing this I suddenly realized that the intended use of reinterpret is probably to have a ReinterpretArray just sitting around. This is so different from how I was doing things with unsafe_wrap that now I’m kind of terrified…


#4

Did you see

>?reinterpret

Warning: It is not allowed to reinterpret to an array to an element type with a larger alignment than the alignment of the array. For example, reinterpret(UInt32, UInt8[0, 0, 0, 0]) is not allowed


#5

That’s on 0.6, we are on 0.7.


#6

There was also a thread about converting bits to ints, where a way to wrap a type for bit access came up that may be handy. There doesn’t appear to be binary & methods defined for float types in 0.6 though


#7

Use bit shifts. Shift each Uint8 with a different amount and or them together.


#8

@ExpandingMan Take a look at SASLib.jl’s conversion function
It should work in both Julia v0.6 and v0.7.


#9

A bit off-topic, but are floating-point types in julia kind of opaque?, as reinterpret falls back to a ccall, and other binary methods like &, |, << haven’t methods for floats.


#10

What do you suggest eg & do with a Float64?


#11

Float64 is still a bitstype, so it could be that 0.0 & 0x01 would give a first (or last) bit value, not minding the length difference for that example. It’s not too unreasonable that they aren’t defined though

It might be easier to define a native julia floating or fixed point value

This works too

x = IOBuffer();
write(x, Float64(1));
seek(x,0);
z = read(x)

#12

If you have a tuple of UInt8's, the following gets completely optimized out for all Unsigned types:

I haven’t found a no op way to create byte tuples from UIntX.


#13

AFAICT the “standard” technique for that is with reinterpret, see eg


#14

Thanks for all the responses.

I think it would be really nice if we had this ability in Base with some variant of reinterpret. I’ve been working on an IO project that involves retrieving data from and writing data to IO buffers and I was hopeful that reinterpret would be able to replace all of the unsafe methods at some point in the near future, but I think we are still very far from that point. It’s definitely going to be really difficult to get it there, because for applications like this it’s going to be absolutely performance critical, and getting reinterpret performance to be on par with that of direct reference wrapping is going to be challenging.


#15

The very basics are mostly working, if it can imitate a Float64 then it may give fairly fast/easy bit access

module xf8
  type fixedp8
    var1::UInt8;
    var2::UInt8;
    end;
  v1(x::Number) = floor(UInt8, x%256);
  v2(x::Number) = v1(256*(x-v1(x)));
  valFl(x::fixedp8) = Float64(x.var1) + x.var2/256;
  valUi(x::fixedp8) = UInt16(x.var1)*256 + x.var2;
  fixedp8(x::Number) = fixedp8(v1(x), v2(x));
  fixedp8(x::UInt16) = fixedp8(UInt8(floor(x/256)), x%256);
  Base.:+(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)+valUi(y));
  Base.:-(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)-valUi(y));
  Base.:*(x::fixedp8, y::fixedp8) = fixedp8( valUi(x)*valUi(y));
  Base.display(x::fixedp8) = print(valFl(x));
  end;

#16

I would love to have a version or reinterpret that takes a bitstype and returns a NTuple{N, UInt8} without overhead.


#17

That’s rather slow though.
To get speed, you’d need to get a pointer, which you can reinterpret to a different pointer type
(essentially a no-op) (although beware of alignment issues on the 32-bit ARM platform).


#18

You might want to take a look at what I’m doing in the Strs package.
To have fast access to codeunits of various sizes (1, 2, 4 bytes), but keeping them stored in a buffer allocated as a String with Base._string_n, I have two functions get_codeunit and set_codeunit!.
These are used to wrap up the unsafe_load and unsafe_store! operations.
I also use the GC.@preserve macro to be safe, and make sure all bounds checking is done in advance.


#19

Yeah, it does seem a bit crazy. Like I said above, I think we have a long way to go before we can do IO stuff without the use of Ptr. I believe we can get there, but I don’t think it’s going to be a 1.0 thing as I had initially hoped.


#20

There’s many methods already provided for safely reading/writing primitive bit elements from IO. For example, read(io, Float64) or read!(io, zeros(Float64, N)). It’s also fairly easy to define a new one: read(io, ::Type{T}) where {T <: S} = read!(io, Ref{T}())[] and write(io, x::S) = write(io, Ref(x))

Have you seen FixedPointNumbers?