Reinterpet fails on empty array

No, that’s not what’s happening here. This case has been added deliberately:

As it’s a new feature, it won’t be backported. Since the reason was a fundamental limitation of the old reinterpret, I don’t think there’s a workaround short of reimplementing the machinery yourself (doesn’t seem like a good idea, as 1.8 is close).

By the way: this whole conversation is pointless. By Julia 1.8.0, it does not error anymore. Furthermore, it offers what it should: a 0-element reinterpret of type A:

julia> struct A end

julia> reinterpret(A, Int8[])
0-element reinterpret(A, ::Vector{Int8})

Shouldn’t this be commented in the release notes of v1.8.0? @Sukera

How is it not 0/0 = 0?

I think it is, yeah. In the PR it is discussed that, for singleton types, triage showed favorable to decide that 0/0 = 0, although Jeff wasn’t comfortable with it either.

I’m almost 90% sure I tried that and it didn’t work :thinking:

Either way, the fact it does is imo an accident due to that PR - it should clearly throw when the array is not empty, as that’s what is checked in the new branch:

I don’t think you’re meant to change the underlying array, which is why I suggested using a zero dimensional view instead. It can’t change contents under you and will always be empty, since it’s immutable.

For example:

julia> struct A end

julia> a = Int8[]
Int8[]

julia> b = reinterpret(A, a)
0-element reinterpret(A, ::Vector{Int8})

julia> push!(a, 0x0)
1-element Vector{Int8}:
 0

julia> b
1-element reinterpret(A, ::Vector{Int8}):
 A()

which surely is not what you want. This won’t happen with a view:

julia> arrv = @view a[1:0]
0-element view(::Vector{Int8}, 1:0) with eltype Int8

julia> b = reinterpret(A, arrv)
0-element reinterpret(A, view(::Vector{Int8}, 1:0))

julia> a
1-element Vector{Int8}:
 0

julia> push!(a, 0x0)
2-element Vector{Int8}:
 0
 0

julia> b
0-element reinterpret(A, view(::Vector{Int8}, 1:0))

I kind of get what you mean. Let us now try to push this logic further, so I can get what I wanted since the beginning:

julia> using StaticArrays

julia> struct A end

julia> reinterpret(SVector{1,A},UInt8[])
0-element reinterpret(SVector{1, A}, ::Vector{UInt8})

Obviously, since an SVector{1,A} is also size 0, the result is a 0-element reinterpet. Is there any way I can get a 1-element object, so I can do only or [1] and get the message I wanted?

This is exactly the issue with the behaviour of reinterpret: the length of reinterpreting a 0-byte array as a singleton type is ambiguous, because the sizeof an array of any number of singletons is 0.

That is sizeof([nothing, nothing, nothing]) == 0 so length(reinterpret(Nothing, Int8[])) could equally reasonably be 3 as 0 or any other length.

IMO the new behaviour is a mistake, because it makes the arbitrary decision that 0/0=0 and you’re already having issues because in your case 0/0=1.

2 Likes

As my understanding of the situation grows, I am more inclined to think you are absolutely right and (alas) it should be disallowed.

Nonetheless, please do not make the error message be as cryptic as it was before.

No, I don’t think this is right. The reinterpreting of empty stuff is (IMO) fine, as long as both source & target type are sizeof == 0. That then even works for arrays with stuff in them:

julia> struct A end

julia> struct B end

julia> As = [A() for _ in 1:5]
5-element Vector{A}:
 A()
 A()
 A()
 A()
 A()

julia> reinterpret(B, As)
5-element reinterpret(B, ::Vector{A}):
 B()
 B()
 B()
 B()
 B()

What I think is an oversight is that the PR then allows reinterpret(A, Int8[]) as well, which it shouldn’t and which doesn’t really make sense.

On top of this, in the particular case @nandoconde encounters we’re coming back to what I asked in the very first answer of mine: You’re trying to create something (a thing where you can call only without it erroring) out of nothing (an empty slice of a vector containing something). That just doesn’t make sense to me in the first place and I suspect it points to your abstraction/mental model failing here. The only way forward is to ask - what is the surrounding structure that lead you to wanting to do this? I think we’re running into a sort of XY-problem, where a different approach to your original problem would be better suited, while at the same time not encountering the specific issue with reinterpret at all.

You mentioned that you were implementing a protocol - does the header not uniquely identify the message already? Why have another reinterpreting step going on there? What are you doing with the result of the reinterpret and would it be better to wrap that in a type, to distinguish the messages?

At present, the element type of the source array is ignored, only the number of bytes in the array matters. This is because reinterpret can output an array of a different size if the source and target element types have different sizes. Unfortunately computing this size leads to an unavoidable 0/0 issue.

I think what you’re advocating for is to add to the API of reinterpret that if the source and target element types are the same size, then the resulting array is the same size as the input. That does clear up this corner case, but makes the API more complicated (depending on whether or not you’re resizing).

No, for things where the sizeof the source & target types is != 0, this already is the case. It’s when you have this mix between “things that take up no space” and “things that take up space” that things go awry. I’m advocating for not allowing “things that take up space” to be reinterpreted as “things that don’t take up space”, because the question without context in and of itself doesn’t make sense, irrespective of whether there actually are things there to be reinterpreted.

The original issue that the PR was based on also only seemed to deal with converting sizeof(A) == 0 to sizeof(B) == 0, though admittedly I’m not quite sure of the context in Plots.jl that prompted the issue.

Yeah, in case of the XY problem, I can explain what I am doing.

The stream of bytes can be intepreted “as is” into different types of my protocol. Indeed, its length is previously known. But the abstraction works perfectly for non-singleton types. Take this example:

julia> abstract type MSG end

julia> struct A <: MSG
       a::UInt8
       b::UInt8
       c::UInt16
       end

julia> struct B <: MSG
       a::Float64
       b::NTuple{4, UInt32}
       end

julia> struct C <: MSG end

julia> ids = Dict(0x01 => A, 0x02 => B, 0x03 => C)
Dict{UInt8, Type} with 3 entries:
  0x02 => B
  0x03 => C
  0x01 => A

julia> id_byte = 0x01 # Of course, this is sent over the network
0x01

julia> msg = [0x01,0x02,0x03,0x04]
4-element Vector{UInt8}:
 0x01
 0x02
 0x03
 0x04

julia> reinterpret(ids[id_byte], msg)
1-element reinterpret(A, ::Vector{UInt8}):
 A(0x01, 0x02, 0x0403)

julia> id_byte = 0x03 # Same as this
0x03

julia> msg = UInt8[]
UInt8[]

julia> reinterpret(ids[id_byte], msg)
0-element reinterpret(C, ::Vector{UInt8})

In the case of A, or B, I can get the message directly with only(), but not in the case of C.

I don’t know what your code surrounding this looks like, but this seems pretty simply solved by introducing some extra dispatch:

# assume the types & ids dict is defined

function parse_msg(data::AbstractVector{UInt8})::MSG
    id = first(data) # assuming it's all received in-band anyway - could of course also have been passed in
    type = ids[id]
    return build_msg(type, @view(data[2:(1+sizeof(type)])::type
end

build_msg(::Type{T}, data::AbstractVector{UInt8}) where T <: MSG = only(reinterpret(T, data))
build_msg(::Type{C}, _) = C() # there is no data for `C` anyway, so just create a new one

Since C is an immutable singleton, all “instances” are the same anyway. Also, you’re more or less writing a parser, so the type instability is both expected & unavoidable.

If you have the id as an out-of-band message, you can also just skip that first step and interface with build_msg more directly.


I’m not 100% sure how legal doing only there is, but since nothing you’ve shown so far contains pointers or mutable stuff like arrays and is isbits, it should be fine. With those you’ll have to write more code anyway, since reinterpret and pointers/mutable things don’t really get along too well.

Seems like the author of the original PR agrees - reinterpreting of non-singleton to singleton is bad:

https://github.com/JuliaLang/julia/pull/45370

Yes, thank you very much for noticing this!
I think I did not forbid it in the initial PR because I simply could not think of a case where it would matter, and I thought it would not hurt to not throw an error in the case the array was empty, basically following the initial expectation of @nandoconde. But your argument about resizing the initial vector is absolutely compelling, and it shows the unsoundness of having such a ReinterpretArray.

Note that for the particular issue at hand here, if all your data is isbits and you know you won’t go out of bonds of your stream, you can use usafe_load and pointer to do your conversion at a low level:

function iterate_msg(data::AbstractVector{UInt8}, index)::Tuple{MSG,Int}
   id = data[index]
   type = ids[id]
   value = unsafe_load(Ptr{type}(pointer(data, index+1))
   newindex = index + 1 + sizeof(type)
   return value, newindex
end

Although the version of @Sukera is safer, so better :wink: Also note that this version does not avoid having to dispatch, it happens at the Ptr{type}(...) conversion.

2 Likes

I have a strong aversion to Ptr in otherwise safe languages :stuck_out_tongue: To me at least, if I can’t express something as performant without manual Ptr or unsafe_* as someone else does with Ptr, that’s a bug (though I know some don’t share this sentiment). RefValue and RefArray are better!

It may be that on 1.8+, the allocation of reinterpret is elided, since it doesn’t survive the return from build_msg. Needs profiling though.

1 Like

Very nice solutions guys, thanks!

And, most of all, thanks for opening the PR to disallow it again :smile: I think this is the right path too