Dealing with complex C structure

The official Julia doc for calling C/Fortran code says:

Packed structs and union declarations are not supported by Julia.

So, does that mean I won’t be able to call the C code? I’m trying to deal with code that takes the following structure… thanks for any kind help here.

struct abc
{
  union {
    double x;
    char *y;
    struct {
        struct abc *array;
        unsigned short a;
        unsigned short b;
    } z;
  } w;
  unsigned short v;
}
1 Like

Taking the approach from the doc:

You can get a near approximation of a union if you know, a priori, the field that will have the greatest size (potentially including padding). When translating your fields to Julia, declare the Julia field to be only of that type.

That’s not too bad. Here’s a MWE:

/* test.c */
typedef struct abc
{
        union {
                unsigned int a;
                long b;
        } ab;
        int c;
} ABC, *ABCP;

ABC make1()
{
        ABC x;
        x.c = 1;
        x.ab.b = 2;
        return x;
}

ABC make2()
{
        ABC x;
        x.c = 1;
        x.ab.a = 2;
        return x;
}

Compile as usual:

$ gcc -fPIC -shared test.c -o libtest.so

In Julia, I can treat the union as a regular long variable (which is the largest in this example):

julia> const TEST = "/tmp/libtest.so"
"/tmp/libtest.so"

julia> x = ccall((:make1, TEST), ABC, ())
ABC(2, 1)

julia> y = ccall((:make2, TEST), ABC, ())
ABC(5815918277548834818, 1)

So the value returned by make2 is obviously messed up because I should be looking for the int rather than the long field. To correct that, I can do some bit twiddling (not my favorite but some people here like it :slight_smile: )

julia> UInt64(y.ab)
0x50b64adc00000002

julia> Int32(y.ab & 0x00000000ffffffff)
2

I really need a way to reinterpret the thing so perhaps I just go with a fixed size tuple.

julia> struct ABC2
         ab::NTuple{8, Cuchar}   # 8 bytes
         c::Int32
       end

julia> z = ccall((:make2, TEST), ABC2, ())
ABC2((0x02, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00), 1)

julia> reinterpret(Int32, collect(z.ab[1:4]))
1-element Array{Int32,1}:
 2

Next, I’ll try to reinterpret to another struct. Hopefully that will work.

Does this all make sense? Is there any better way to approach this?

An easy “implementation-defined” variant on 64 bit is the following:

struct abc_x
a::Float64
padding::Int32
v::UInt16
end

struct abc_y
y::Ptr{UInt8}
padding::Int32
v::UInt16
end

struct abc_z
array::Ptr{Void}
a::UInt16
b::UInt16
v::UInt16
end

@assert sizeof(abc_x)==sizeof(abc_y)==sizeof(abc_z);

You can freely reinterpret between these three variants and pass any of them to your C function (and reinterpret outputs). If you want to be fancy then you can give the pointer abc a more reasonable type.

You get no UB with respect to the explicit padding bytes (if the padding was implicit in julia, then reinterpret can give undefined results; basically the reinterpret vomits register content into the padding, just like C; but C knows that all maybe-relevant bytes in the union must be preserved)

If your C code has #pragma pack then you need to figure out the layout, by debugger or spec-reading (in this specific example there should be no difference). I am not entirely sure how to make a variant that also works on 32bit; maybe use an if sizeof(Ptr{Int})==8 and write two versions by hand? Do you care about 32bit?

(advantage of using int/ptr-types instead of NTuple{8,UInt8}: endian-ness is automatically correct; but everything is little-endian today anyway, so whatever. Even if the other side expects network byte order, the int-type tells julia about the alignment)

2 Likes

Post-script: I think the canonical 0.7 variant would be something like

struct abc
f1::UInt64
f2::UInt16
f3::UInt16
f4::UInt16
end

with getproperty and setproperty!(Ptr{abc}, ...) and constructors overloaded (so no user-facing reinterpret needed, and the nesting is the same as in C).

Someone should write a package for this (which preferably also does #pragma pack, endian-ness and 32bit), with a macro @julia_struct_from_C_like that does all the computations and definitions (maybe even in a way that can eat the header-file post pre-processor). Maybe Keno’s Cxx.jl already does the job? Maybe it will do the job in the future?

I am not entirely whether this always works, though; that is, whether julia gets confused when writing into an UInt16 that is part of a UInt64 (they are a union; but aliasing / alignment analysis may get confused in the end, and you want the UInt16-write to be atomic and 2-byte-aligned, and not modify the remaining bits in multi-threaded code; on the other hand, you want the UInt64 reads/writes to be 8-byte-aligned, so reinterpreting from four UInt16 does not cut the cake; what is clang doing?)

(personally, I would also make the meaningless structure padding at the end explicit; you save some cycles by letting them undefined, but then you avoid all headaches of “did I handle a password/crypto and need to sanitize paddings before serialization?” and avoid all possible human errors in size-calculations; so just add another padding_end::UInt16 at the end)

1 Like

I don’t have time to dig more at the moment but a quick test shows that reinterpreting Ptr{Void} as a Float64 crashes julia. Will come back to this later tonight.

julia> struct DEF2 
         mess1::Ptr{Void}
         a::UInt16
         b::UInt16
         pad2::NTuple{4, Cuchar}
         v::Int16
         pad3::NTuple{6, Cuchar}
       end

julia> z = ccall((:make5, TEST), DEF2, ())
DEF2(Ptr{Void} @0x4008000000000000, 0x19e8, 0x0c71, (0x9f, 0x7f, 0x00, 0x00), 1, (0x00, 0x00, 0x00, 0x00, 0x00, 0x00))

julia> reinterpret(Float64, z.mess1)
LLVM ERROR: Cannot select: 0x7f9f0aa2b520: f64 = truncate 0x7f9f0b89cfc0
  0x7f9f0b89cfc0: i64,ch = CopyFromReg 0x7f9f09c5c030, Register:i64 %vreg1
    0x7f9f0a9df690: i64 = Register %vreg1
In function: julia_reinterpret_62748

where the C code looks like:

typedef struct def
{
		union  {
				double x;   /* 8 bytes */
				char *y;    /* 8 bytes as for 64-bit system */
				struct  {    
						struct def *array; /* 8 bytes */
						unsigned short a;  /* 2 bytes */
						unsigned short b;  /* 2 bytes + padded 4 bytes */
				} z;
		} w;
		unsigned short v;  /* 2 bytes -> padded to 8 bytes (64-bit system) */
} DEF, *DEFP;

DEF make5()
{
		DEF x;
		x.v = 1;
		x.w.x = 3.0;
		return x;
}

That’s what I get for mixing julia versions. Reinterpret between Float64 and pointers works fine on 0.7; on 0.6 you can go via reinterpret(Float64, reinterpret(UInt64,C_NULL)).

So you could put a reinterpret to a pure uint-struct in beetween? This should be a nop, but is really ugly.

I guess this is a bug. I’ll see whether I find it in the tracker and otherwise open an issue.

Also, are you sure about your offsets? I thought the C struct would be:

typedef struct def
{  /* always 8-byte aligned; has bytes 1:16 */
		union  {
				double x;   /* bytes 1:8 */
				char *y;    /* bytes 1:8 */
				struct  {    
						struct def *array; /* bytes 1:8 */
						unsigned short a;  /* bytes 9:10*/
						unsigned short b;  /* bytes 11:12 */
				} z; /* bytes 1:12 */
		} w; /* bytes 1:12 */
		unsigned short v;  /* bytes 13:14*/
      /*Padding: bytes 15:16 ; filled with UB*/
} DEF, *DEFP;

If you get away with passing pointers to structs, then all this conversion hell goes away: build your chunk of memory somewhere and pass the pointer to the ccall. Then you would just reinterpret the pointer and unsafe_load/store! in order to modify/reinterpret your union.

Looks like UInt64 is most reinterpretation-friendly for 8-byte fields. I can reinterpret it to double or char* or a pointer. The only drawback is that if I have anything smaller than 8 bytes then I would have to shift/convert.

Sorry, I don’t understand this question.

Yes, I’m certain as I went through a couple of scenarios. Apparently gcc requires the inner struct to be 8-byte aligned as well so it added an extra 4 bytes for that.

So I opened an issue on https://github.com/JuliaLang/julia/issues/26053.

A quick fix for your code is

Base.reinterpret(::Type{Ptr{T}}, x::Float64) where T = reinterpret(Ptr{T}, reinterpret(UInt64, x))
Base.reinterpret(::Type{Float64}, x::Ptr) = reinterpret(Float64, reinterpret(UInt64, x))

Unfortunately I don’t understand the code-base well enough to submit this as a PR; someone who really knows it needs to check whether there are other pointer-reinterpret cases that make problems.

(I tested the @code_llvm and correctness for this quick-fix, but have no idea whether there are other problems; also, you need to modify this for 32 bit systems )

(also this might not emit the fastest @code_native because of register-juggling between xmm and general purpose registers; but this shouldn’t matter for your case)

1 Like

See also StrPack.jl and StructIO.jl. They don’t know about unions as far as I can tell, but IIRC at least StrPack knows about packing rules. Perhaps getters/setters could be created automatically with getproperty overloading. I believe StructIO is the more modern/updated because keno and staticfloat are using it for object file work, so ideally add or port any missing functionality there.

I’m trying to wrap SDL2 Events but I’m not sure I’m doing it right.

SDL Event is a big union of all the possible event types. But they make sure it’s 52 bytes long, so I think Clang.jl did the right thing here:

mutable struct Event
    _Event::NTuple{56, Uint8}
end

I can declare and fill an Event with:

ev = SDL2.Event(ntuple(i->-1,56))
SDL2.PollEvent(pointer_from_objref(ev))

Then the first field of this indicates which type of event it is (in this particular case 1024 is a WindowEvent), so I can try to convert it:

unsafe_load( Ptr{SDL2.WindowEvent}(pointer_from_objref(ev)) )
SDL2.WindowEvent(1024, 2856, 1, 0, 0, 495, 544, 10, 10)

This seems to work but I’m not sure if it’s correct and it’s a bit ugly, is there a better way to do it ?

I had to solve a similar issue with Gtk many years ago: https://github.com/JuliaGraphics/Gtk.jl/blob/d0a218011bbb3e30934bdabd780131d1eaa6e3d0/src/gdk.jl#L74

Although these days, I might instead use getproperty overloading to simulate union-field access on the mutable struct Event object – basically what you are doing, and wrapped in a nice API so that users don’t see the ugly @gc_preserve / pointer_from_objref / unsafe_load mess