Regression between v0.6 and master with reinterpret from #21831


#1

On v0.6.0-rc2, this works:

julia> primitive type Foo 24 end ; reinterpret(Foo, b"123")
1-element Array{Foo,1}:
 Foo(0x333231)

On v0.7.0-DEV.427, it fails:

julia> primitive type Foo 24 end ; reinterpret(Foo, b"123")
ERROR: ArgumentError: reinterpret from alignment 1 bytes to alignment 4 bytes not allowed
Stacktrace:
 [1] reinterpret(::Type{Foo}, ::Array{UInt8,1}, ::Tuple{Int64}) at ./array.jl:160
 [2] reinterpret(::Type{Foo}, ::Array{UInt8,1}) at ./array.jl:139

This seems to be another regression caused by @yuyichao’s change https://github.com/JuliaLang/julia/pull/21831


How to write bits into a user-defined primitive type?
#2

This is intentional, and is the bug that change is intended to catch: computer hardware can’t be relied upon to load a string (with byte alignment) as an integer (with word alignment)


#3

I think the bug is that the alignment for a 24-bit value is set to be 4, not 1.
The alignment of a primitive type is not necessarily going to be a power of 2,
for example, a 12-byte primitive type might need anything from 1-byte alignment to 8 or 16 byte alignment (which would require padding when placed into an array).
Also, the change #21831 seems like overkill, when it is just ARM processors that have strict alignment requirements (of the supported Julia platforms).


#4

Note: another bit of evidence that it is a bug to ask for an alignment of 4 for a 24-bit primitive type, is that if you create a Vector of type Foo, and ask for sizeof, you get 3n, not 4n, so it is not forcing an alignment of 4 bytes after all.

If this is not going to be fixed, how are primitive types supposed to be constructed?

Also, #21831 has no warnings in NEWS.md, and there wasn’t any deprecation period for this change.


#5

What’s sizeof got to do with it? These numbers can be different without being wrong or inconsistent.

The same way they always have: as numbers.

It should be mentioned in NEWs. I know the author has also tried to help ensure all packages in METADATA were fixed before it was merged. We don’t usually do depreciations when fixing bugs / adding errors for broken code.


#6

Making a vector of a 24-bit primitive type, you can see they are not aligned on 4-byte boundaries.
The code generated to access them is correct (and is reasonably efficient, unlike if you try to use NTuple{3,UInt8} instead (which was @mbauman’s suggestion on Gitter).

Can you give an example of that then? Previously, it was easy to reinterpret a vector of bytes as a vector of different primitive types (such as the 24-bit one that @sbromberger has a use case for).
That’s not possible any longer.

This change can have serious consequences to performance - and seems to only have been done because of issues on ARM.
This doesn’t seem to really be fixing a bug - at least, not on the Intel/AMD or POWER platforms.

This change has broken code that worked perfectly well before. Where was the “bug” (except possibly on ARM platforms)?


#7

For reference: a 24-bit integer would be a really nice sweet spot for medium size graphs: 64k vertices is too small, but 2^32 is too large. 16.7 milllion vertices handles a lot of common graph structures on commodity hardware, and the 16-bit-per-edge cost savings (vs 32 bit ints) can really add up.