Reinterpet fails on empty array

Sukera · May 18, 2022, 11:28am

No, that’s not what’s happening here. This case has been added deliberately:

github.com/JuliaLang/julia

Allow reinterpreting singleton types

JuliaLang:master ← Liozou:reinterpretsingleton

opened 02:56PM - 21 Dec 21 UTC

Liozou

+110 -9

Fix #43403. It does not actually allow reinterpreting all 0-sized types, only th…e immutable ones (aka singleton types), but I think that's what was intended anyway (see also #17149). I'm not sure whether it should be merged, considering what @vtjnash said in https://github.com/JuliaLang/julia/issues/34865#issuecomment-590945279, but I'm opening the PR in case it is decided it is worth adding. Since `reinterpret` is a fairly basic building block, there is also the potential issue of slowing down critical paths. I don't think there is any harm done in this regard since the `sizeof(T) == 0` and `isdefined(T, :instance)` conditions should all be eliminated at compilation I think, but I'm no expert on this.

As it’s a new feature, it won’t be backported. Since the reason was a fundamental limitation of the old reinterpret, I don’t think there’s a workaround short of reimplementing the machinery yourself (doesn’t seem like a good idea, as 1.8 is close).

nandoconde · May 18, 2022, 1:19pm

By the way: this whole conversation is pointless. By Julia 1.8.0, it does not error anymore. Furthermore, it offers what it should: a 0-element reinterpret of type A:

julia> struct A end

julia> reinterpret(A, Int8[])
0-element reinterpret(A, ::Vector{Int8})

Shouldn’t this be commented in the release notes of v1.8.0? @Sukera

DNF · May 18, 2022, 1:42pm

How is it not 0/0 = 0?

nandoconde · May 18, 2022, 2:16pm

I think it is, yeah. In the PR it is discussed that, for singleton types, triage showed favorable to decide that 0/0 = 0, although Jeff wasn’t comfortable with it either.

Sukera · May 18, 2022, 2:27pm

I’m almost 90% sure I tried that and it didn’t work

Either way, the fact it does is imo an accident due to that PR - it should clearly throw when the array is not empty, as that’s what is checked in the new branch:

github.com

JuliaLang/julia/blob/138c8e6a281b82b1814f3e99eee826bd4c11a992/base/reinterpretarray.jl#L47


      
                  @noinline
                  throw(ArgumentError("cannot reinterpret a `$(S)` array to `$(T)` when the first axis is $ax1. Try reshaping first."))
              end
              isbitstype(T) || throwbits(S, T, T)
              isbitstype(S) || throwbits(S, T, S)
              (N != 0 || sizeof(T) == sizeof(S)) || throwsize0(S, T, "different")
              if N != 0 && sizeof(S) != sizeof(T)
                  ax1 = axes(a)[1]
                  dim = length(ax1)
                  if issingletontype(T)
                      dim == 0 || throwsingleton(S, T, "a non-empty")
                  else
                      rem(dim*sizeof(S),sizeof(T)) == 0 || thrownonint(S, T, dim)
                  end
                  first(ax1) == 1 || throwaxes1(S, T, ax1)
              end
              readable = array_subpadding(T, S)
              writable = array_subpadding(S, T)
              new{T, N, S, A, false}(a, readable, writable)
          end
          reinterpret(::Type{T}, a::AbstractArray{T}) where {T} = a

I don’t think you’re meant to change the underlying array, which is why I suggested using a zero dimensional view instead. It can’t change contents under you and will always be empty, since it’s immutable.

Sukera · May 18, 2022, 2:33pm

For example:

julia> struct A end

julia> a = Int8[]
Int8[]

julia> b = reinterpret(A, a)
0-element reinterpret(A, ::Vector{Int8})

julia> push!(a, 0x0)
1-element Vector{Int8}:
 0

julia> b
1-element reinterpret(A, ::Vector{Int8}):
 A()

which surely is not what you want. This won’t happen with a view:

julia> arrv = @view a[1:0]
0-element view(::Vector{Int8}, 1:0) with eltype Int8

julia> b = reinterpret(A, arrv)
0-element reinterpret(A, view(::Vector{Int8}, 1:0))

julia> a
1-element Vector{Int8}:
 0

julia> push!(a, 0x0)
2-element Vector{Int8}:
 0
 0

julia> b
0-element reinterpret(A, view(::Vector{Int8}, 1:0))

nandoconde · May 18, 2022, 2:48pm

I kind of get what you mean. Let us now try to push this logic further, so I can get what I wanted since the beginning:

julia> using StaticArrays

julia> struct A end

julia> reinterpret(SVector{1,A},UInt8[])
0-element reinterpret(SVector{1, A}, ::Vector{UInt8})

Obviously, since an SVector{1,A} is also size 0, the result is a 0-element reinterpet. Is there any way I can get a 1-element object, so I can do only or [1] and get the message I wanted?

cjdoris · May 18, 2022, 6:42pm

This is exactly the issue with the behaviour of reinterpret: the length of reinterpreting a 0-byte array as a singleton type is ambiguous, because the sizeof an array of any number of singletons is 0.

That is sizeof([nothing, nothing, nothing]) == 0 so length(reinterpret(Nothing, Int8[])) could equally reasonably be 3 as 0 or any other length.

IMO the new behaviour is a mistake, because it makes the arbitrary decision that 0/0=0 and you’re already having issues because in your case 0/0=1.

nandoconde · May 19, 2022, 6:52am

As my understanding of the situation grows, I am more inclined to think you are absolutely right and (alas) it should be disallowed.

Nonetheless, please do not make the error message be as cryptic as it was before.

Sukera · May 19, 2022, 11:08am

No, I don’t think this is right. The reinterpreting of empty stuff is (IMO) fine, as long as both source & target type are sizeof == 0. That then even works for arrays with stuff in them:

julia> struct A end

julia> struct B end

julia> As = [A() for _ in 1:5]
5-element Vector{A}:
 A()
 A()
 A()
 A()
 A()

julia> reinterpret(B, As)
5-element reinterpret(B, ::Vector{A}):
 B()
 B()
 B()
 B()
 B()

What I think is an oversight is that the PR then allows reinterpret(A, Int8[]) as well, which it shouldn’t and which doesn’t really make sense.

On top of this, in the particular case @nandoconde encounters we’re coming back to what I asked in the very first answer of mine: You’re trying to create something (a thing where you can call only without it erroring) out of nothing (an empty slice of a vector containing something). That just doesn’t make sense to me in the first place and I suspect it points to your abstraction/mental model failing here. The only way forward is to ask - what is the surrounding structure that lead you to wanting to do this? I think we’re running into a sort of XY-problem, where a different approach to your original problem would be better suited, while at the same time not encountering the specific issue with reinterpret at all.

You mentioned that you were implementing a protocol - does the header not uniquely identify the message already? Why have another reinterpreting step going on there? What are you doing with the result of the reinterpret and would it be better to wrap that in a type, to distinguish the messages?

cjdoris · May 19, 2022, 12:17pm

At present, the element type of the source array is ignored, only the number of bytes in the array matters. This is because reinterpret can output an array of a different size if the source and target element types have different sizes. Unfortunately computing this size leads to an unavoidable 0/0 issue.

I think what you’re advocating for is to add to the API of reinterpret that if the source and target element types are the same size, then the resulting array is the same size as the input. That does clear up this corner case, but makes the API more complicated (depending on whether or not you’re resizing).

Sukera · May 19, 2022, 12:33pm

No, for things where the sizeof the source & target types is != 0, this already is the case. It’s when you have this mix between “things that take up no space” and “things that take up space” that things go awry. I’m advocating for not allowing “things that take up space” to be reinterpreted as “things that don’t take up space”, because the question without context in and of itself doesn’t make sense, irrespective of whether there actually are things there to be reinterpreted.

The original issue that the PR was based on also only seemed to deal with converting sizeof(A) == 0 to sizeof(B) == 0, though admittedly I’m not quite sure of the context in Plots.jl that prompted the issue.

nandoconde · May 19, 2022, 1:11pm

Yeah, in case of the XY problem, I can explain what I am doing.

The stream of bytes can be intepreted “as is” into different types of my protocol. Indeed, its length is previously known. But the abstraction works perfectly for non-singleton types. Take this example:

julia> abstract type MSG end

julia> struct A <: MSG
       a::UInt8
       b::UInt8
       c::UInt16
       end

julia> struct B <: MSG
       a::Float64
       b::NTuple{4, UInt32}
       end

julia> struct C <: MSG end

julia> ids = Dict(0x01 => A, 0x02 => B, 0x03 => C)
Dict{UInt8, Type} with 3 entries:
  0x02 => B
  0x03 => C
  0x01 => A

julia> id_byte = 0x01 # Of course, this is sent over the network
0x01

julia> msg = [0x01,0x02,0x03,0x04]
4-element Vector{UInt8}:
 0x01
 0x02
 0x03
 0x04

julia> reinterpret(ids[id_byte], msg)
1-element reinterpret(A, ::Vector{UInt8}):
 A(0x01, 0x02, 0x0403)

julia> id_byte = 0x03 # Same as this
0x03

julia> msg = UInt8[]
UInt8[]

julia> reinterpret(ids[id_byte], msg)
0-element reinterpret(C, ::Vector{UInt8})

In the case of A, or B, I can get the message directly with only(), but not in the case of C.

Sukera · May 19, 2022, 1:40pm

I don’t know what your code surrounding this looks like, but this seems pretty simply solved by introducing some extra dispatch:

# assume the types & ids dict is defined

function parse_msg(data::AbstractVector{UInt8})::MSG
    id = first(data) # assuming it's all received in-band anyway - could of course also have been passed in
    type = ids[id]
    return build_msg(type, @view(data[2:(1+sizeof(type)])::type
end

build_msg(::Type{T}, data::AbstractVector{UInt8}) where T <: MSG = only(reinterpret(T, data))
build_msg(::Type{C}, _) = C() # there is no data for `C` anyway, so just create a new one

Since C is an immutable singleton, all “instances” are the same anyway. Also, you’re more or less writing a parser, so the type instability is both expected & unavoidable.

If you have the id as an out-of-band message, you can also just skip that first step and interface with build_msg more directly.

I’m not 100% sure how legal doing only there is, but since nothing you’ve shown so far contains pointers or mutable stuff like arrays and is isbits, it should be fine. With those you’ll have to write more code anyway, since reinterpret and pointers/mutable things don’t really get along too well.

Sukera · May 19, 2022, 2:51pm

Seems like the author of the original PR agrees - reinterpreting of non-singleton to singleton is bad:

https://github.com/JuliaLang/julia/pull/45370

Liozou · May 19, 2022, 3:50pm

Yes, thank you very much for noticing this!
I think I did not forbid it in the initial PR because I simply could not think of a case where it would matter, and I thought it would not hurt to not throw an error in the case the array was empty, basically following the initial expectation of @nandoconde. But your argument about resizing the initial vector is absolutely compelling, and it shows the unsoundness of having such a ReinterpretArray.

Note that for the particular issue at hand here, if all your data is isbits and you know you won’t go out of bonds of your stream, you can use usafe_load and pointer to do your conversion at a low level:

function iterate_msg(data::AbstractVector{UInt8}, index)::Tuple{MSG,Int}
   id = data[index]
   type = ids[id]
   value = unsafe_load(Ptr{type}(pointer(data, index+1))
   newindex = index + 1 + sizeof(type)
   return value, newindex
end

Although the version of @Sukera is safer, so better Also note that this version does not avoid having to dispatch, it happens at the Ptr{type}(...) conversion.

Sukera · May 19, 2022, 3:56pm

I have a strong aversion to Ptr in otherwise safe languages To me at least, if I can’t express something as performant without manual Ptr or unsafe_* as someone else does with Ptr, that’s a bug (though I know some don’t share this sentiment). RefValue and RefArray are better!

It may be that on 1.8+, the allocation of reinterpret is elided, since it doesn’t survive the return from build_msg. Needs profiling though.

nandoconde · May 19, 2022, 6:14pm

Very nice solutions guys, thanks!

And, most of all, thanks for opening the PR to disallow it again I think this is the right path too

Topic		Replies	Views
Unusual empty `UnitRange`s while reinterpreting a vector General Usage array , range	4	257	January 5, 2023
Reinterpret to existing vector Performance question , performance	16	600	January 29, 2023
odd byte length primitive types and reinterpret() General Usage	3	938	February 13, 2018
Reinterpret byte to Float in julia Data binaryio , float , io	4	346	March 15, 2024
Reinterpret packed data which is already in memory? General Usage	3	1098	May 15, 2020

Reinterpet fails on empty array

Related topics