C calling convention mismatch?

I’m using PackageCompiler.jl to build a shared library, which I then link into a C driver program. One of my @ccallable functions takes a “large” (168 bytes) struct as an argument by value. Things worked fine on macOS, but then my CI on Linux was failing. I found that a later argument, which was a pointer, was not pointing at the same address in both Julia and C. It took me a little while to figure this out, but that large struct that was passed by value was also getting different bytes. This seems to be some kind of inconsistency in the calling convention, so Julia was basically reading the wrong bytes for the arguments, but I’m in over my head here.

I gather that one way to fix this is to just avoid passing large structs by value, and instead pass pointers. But I thought I’d ask about it here to try to learn a bit more.

Is this a bug, or is this unavoidable?

It’s going to be hard to help without a MWE.

1 Like

Were you using the same compiler? Different C compilers will pass structs differently when passed by value.

Sorry for the delay. I can’t share the overall code, and it took a little work to extract a MWE. I pushed it up here: GitHub - danielmatz/Julia-C-Calling-Convention-Test · GitHub.

I tried different compilers and Julia versions for the MWE (recorded in the README), but didn’t notice any particular pattern.

The failing cases are all on our compute cluster, and the working cases are all on my laptop.

Interestingly, the incorrect pointer isn’t random garbage - it is actually the correct pointer minus 184 bytes. And 184 is 168 + 16, with 168 being the size of the large struct.

I’m not sure what this means exactly - perhaps you are seeing another pointer to the same stack, just the other side of the large struct.

Or perhaps the correct pointer is getting mangled a small amount for some reason.

No solution but a breadcrumb!

1 Like

Since I’m in over my head, I tried using ChatGPT to help me out. So take this all with a grain of salt, since I can’t assess if ChatGPT is right or not.

If I dump the LLVM IR for my example C program on my Mac and then again on the Linux server (x86_64), I can see that the large struct argument is being treated differently in C on the two systems.

On my Mac:

call void @test_large_struct(i64 %36, ptr noundef %5, ptr noundef %4)

And then on the server:

call void @test_large_struct(i32 %32, ptr noundef byval(%struct.Large) align 8 %3, ptr noundef %4)

If I then look at the shared library compiled with PackageCompiler.jl on the two systems, ChatGPT tells me that the compiled code matches the LLVM IR assumptions on my Mac but not on Linux. But I don’t know assembly well enough to confirm that.

On my Mac:

Assembly
_test_large_struct.3:
000000000034f1a0	sub	sp, sp, #0xf0
000000000034f1a4	stp	x24, x23, [sp, #0xb0]
000000000034f1a8	stp	x22, x21, [sp, #0xc0]
000000000034f1ac	stp	x20, x19, [sp, #0xd0]
000000000034f1b0	stp	x29, x30, [sp, #0xe0]
000000000034f1b4	adrp	x8, 3757 ; 0x11fc000
000000000034f1b8	adrp	x9, 49110 ; 0xc325000
000000000034f1bc	mov	x22, x0
000000000034f1c0	ldr	x10, [x8, #0x5a0]
000000000034f1c4	ldr	x8, [x9, #0x9d0]
000000000034f1c8	mov	x19, x2
000000000034f1cc	mov	x21, x1
000000000034f1d0	mov	x0, x8
000000000034f1d4	blr	x10
000000000034f1d8	cbz	x0, 0x34f200
000000000034f1dc	ldr	x8, [x0, #0x10]
000000000034f1e0	mov	x20, x0
000000000034f1e4	ldrb	w23, [x8, #0x19]
000000000034f1e8	add	x9, x8, #0x19
000000000034f1ec	stlrb	wzr, [x9]
000000000034f1f0	cbnz	w23, 0x34f20c
000000000034f1f4	ldr	x8, [x8, #0x10]
000000000034f1f8	ldr	xzr, [x8]
000000000034f1fc	b	0x34f20c
000000000034f200	mov	w23, #0x2
000000000034f204	bl	0xd7de50 ; symbol stub for: _ijl_autoinit_and_adopt_thread
000000000034f208	mov	x20, x0
000000000034f20c	ldp	q0, q1, [x21]
000000000034f210	adrp	x9, 3677 ; 0x11ac000
000000000034f214	str	w22, [sp, #0xac]
000000000034f218	adrp	x8, 49092 ; 0xc313000
000000000034f21c	add	x8, x8, #0xa98
000000000034f220	adrp	x10, 3719 ; 0x11d6000
000000000034f224	stp	q0, q1, [sp]
000000000034f228	ldp	q2, q0, [x21, #0x20]
000000000034f22c	stp	q2, q0, [sp, #0x20]
000000000034f230	ldp	q1, q0, [x21, #0x40]
000000000034f234	stp	q1, q0, [sp, #0x40]
000000000034f238	ldp	q2, q0, [x21, #0x60]
000000000034f23c	stp	q2, q0, [sp, #0x60]
000000000034f240	ldp	q1, q0, [x21, #0x80]
000000000034f244	ldr	d2, [x21, #0xa0]
000000000034f248	ldr	x21, [x20, #0x8]
000000000034f24c	stp	q1, q0, [sp, #0x80]
000000000034f250	str	d2, [sp, #0xa0]
000000000034f254	ldr	x9, [x9, #0x310]
000000000034f258	ldar	x11, [x8]
000000000034f25c	ldr	x8, [x10, #0x568]
000000000034f260	ldar	x9, [x9]
000000000034f264	cmp	x11, x9
000000000034f268	str	x9, [x20, #0x8]
000000000034f26c	b.eq	0x34f294
000000000034f270	sub	x0, x20, #0x98
000000000034f274	adrp	x1, 3719 ; 0x11d6000
000000000034f278	add	x1, x1, #0x568
000000000034f27c	adrp	x2, 49092 ; 0xc313000
000000000034f280	add	x2, x2, #0xa98
000000000034f284	adrp	x3, 3700 ; 0x11c3000
000000000034f288	add	x3, x3, #0x50
000000000034f28c	bl	0xd7e1d4 ; symbol stub for: _jl_get_abi_converter
000000000034f290	mov	x8, x0
000000000034f294	add	x0, sp, #0xac
000000000034f298	mov	x1, sp
000000000034f29c	mov	x2, x19
000000000034f2a0	blr	x8
000000000034f2a4	ldr	x8, [x20, #0x10]
000000000034f2a8	str	x21, [x20, #0x8]
000000000034f2ac	add	x9, x8, #0x19
000000000034f2b0	stlrb	w23, [x9]
000000000034f2b4	ldr	x8, [x8, #0x10]
000000000034f2b8	ldr	xzr, [x8]
000000000034f2bc	ldp	x29, x30, [sp, #0xe0]
000000000034f2c0	ldp	x20, x19, [sp, #0xd0]
000000000034f2c4	ldp	x22, x21, [sp, #0xc0]
000000000034f2c8	ldp	x24, x23, [sp, #0xb0]
000000000034f2cc	add	sp, sp, #0xf0
000000000034f2d0	ret

On the Linux server:

Assembly
0000000001cde070 <test_large_struct>:
 1cde070:	55                   	push   %rbp
 1cde071:	48 89 e5             	mov    %rsp,%rbp
 1cde074:	48 89 f2             	mov    %rsi,%rdx
 1cde077:	48 8d 05 82 1f 8b 01 	lea    0x18b1f82(%rip),%rax        # 3590000 <_GLOBAL_OFFSET_TABLE_>
 1cde07e:	48 b9 c0 61 4e 00 00 	movabs $0x4e61c0,%rcx
 1cde085:	00 00 00 
 1cde088:	48 8b 04 08          	mov    (%rax,%rcx,1),%rax
 1cde08c:	48 8d 75 10          	lea    0x10(%rbp),%rsi
 1cde090:	5d                   	pop    %rbp
 1cde091:	ff e0                	jmpq   *%rax
 1cde093:	66 66 66 66 2e 0f 1f 	data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
 1cde09a:	84 00 00 00 00 00 

Does anyone have thoughts on this? I can’t tell if I have a stupid mistake somewhere, or if I’ve uncovered a bug in Julia or PackageCompiler.jl…

Can you reproduce the problem without using PackageCompiler, by having Julia call C rather than the other way around? Or alternatively by using @cfunction to generate a C function pointer from Julia at runtime, and then passing it to C to call?

Thank you all for your ideas and suggestions. I’m still working through some of them.

But in the meantime, I was able to test more configurations, and I think I’ve narrowed down when this issue was introduced. Things seem to work as I would expect with Julia 1.10.10, but then break with Julia 1.10.11. This holds across different versions of PackageCompiler.jl.