Since I’m in over my head, I tried using ChatGPT to help me out. So take this all with a grain of salt, since I can’t assess if ChatGPT is right or not.
If I dump the LLVM IR for my example C program on my Mac and then again on the Linux server (x86_64), I can see that the large struct argument is being treated differently in C on the two systems.
On my Mac:
call void @test_large_struct(i64 %36, ptr noundef %5, ptr noundef %4)
And then on the server:
call void @test_large_struct(i32 %32, ptr noundef byval(%struct.Large) align 8 %3, ptr noundef %4)
If I then look at the shared library compiled with PackageCompiler.jl on the two systems, ChatGPT tells me that the compiled code matches the LLVM IR assumptions on my Mac but not on Linux. But I don’t know assembly well enough to confirm that.
On my Mac:
Assembly
_test_large_struct.3:
000000000034f1a0 sub sp, sp, #0xf0
000000000034f1a4 stp x24, x23, [sp, #0xb0]
000000000034f1a8 stp x22, x21, [sp, #0xc0]
000000000034f1ac stp x20, x19, [sp, #0xd0]
000000000034f1b0 stp x29, x30, [sp, #0xe0]
000000000034f1b4 adrp x8, 3757 ; 0x11fc000
000000000034f1b8 adrp x9, 49110 ; 0xc325000
000000000034f1bc mov x22, x0
000000000034f1c0 ldr x10, [x8, #0x5a0]
000000000034f1c4 ldr x8, [x9, #0x9d0]
000000000034f1c8 mov x19, x2
000000000034f1cc mov x21, x1
000000000034f1d0 mov x0, x8
000000000034f1d4 blr x10
000000000034f1d8 cbz x0, 0x34f200
000000000034f1dc ldr x8, [x0, #0x10]
000000000034f1e0 mov x20, x0
000000000034f1e4 ldrb w23, [x8, #0x19]
000000000034f1e8 add x9, x8, #0x19
000000000034f1ec stlrb wzr, [x9]
000000000034f1f0 cbnz w23, 0x34f20c
000000000034f1f4 ldr x8, [x8, #0x10]
000000000034f1f8 ldr xzr, [x8]
000000000034f1fc b 0x34f20c
000000000034f200 mov w23, #0x2
000000000034f204 bl 0xd7de50 ; symbol stub for: _ijl_autoinit_and_adopt_thread
000000000034f208 mov x20, x0
000000000034f20c ldp q0, q1, [x21]
000000000034f210 adrp x9, 3677 ; 0x11ac000
000000000034f214 str w22, [sp, #0xac]
000000000034f218 adrp x8, 49092 ; 0xc313000
000000000034f21c add x8, x8, #0xa98
000000000034f220 adrp x10, 3719 ; 0x11d6000
000000000034f224 stp q0, q1, [sp]
000000000034f228 ldp q2, q0, [x21, #0x20]
000000000034f22c stp q2, q0, [sp, #0x20]
000000000034f230 ldp q1, q0, [x21, #0x40]
000000000034f234 stp q1, q0, [sp, #0x40]
000000000034f238 ldp q2, q0, [x21, #0x60]
000000000034f23c stp q2, q0, [sp, #0x60]
000000000034f240 ldp q1, q0, [x21, #0x80]
000000000034f244 ldr d2, [x21, #0xa0]
000000000034f248 ldr x21, [x20, #0x8]
000000000034f24c stp q1, q0, [sp, #0x80]
000000000034f250 str d2, [sp, #0xa0]
000000000034f254 ldr x9, [x9, #0x310]
000000000034f258 ldar x11, [x8]
000000000034f25c ldr x8, [x10, #0x568]
000000000034f260 ldar x9, [x9]
000000000034f264 cmp x11, x9
000000000034f268 str x9, [x20, #0x8]
000000000034f26c b.eq 0x34f294
000000000034f270 sub x0, x20, #0x98
000000000034f274 adrp x1, 3719 ; 0x11d6000
000000000034f278 add x1, x1, #0x568
000000000034f27c adrp x2, 49092 ; 0xc313000
000000000034f280 add x2, x2, #0xa98
000000000034f284 adrp x3, 3700 ; 0x11c3000
000000000034f288 add x3, x3, #0x50
000000000034f28c bl 0xd7e1d4 ; symbol stub for: _jl_get_abi_converter
000000000034f290 mov x8, x0
000000000034f294 add x0, sp, #0xac
000000000034f298 mov x1, sp
000000000034f29c mov x2, x19
000000000034f2a0 blr x8
000000000034f2a4 ldr x8, [x20, #0x10]
000000000034f2a8 str x21, [x20, #0x8]
000000000034f2ac add x9, x8, #0x19
000000000034f2b0 stlrb w23, [x9]
000000000034f2b4 ldr x8, [x8, #0x10]
000000000034f2b8 ldr xzr, [x8]
000000000034f2bc ldp x29, x30, [sp, #0xe0]
000000000034f2c0 ldp x20, x19, [sp, #0xd0]
000000000034f2c4 ldp x22, x21, [sp, #0xc0]
000000000034f2c8 ldp x24, x23, [sp, #0xb0]
000000000034f2cc add sp, sp, #0xf0
000000000034f2d0 ret
On the Linux server:
Assembly
0000000001cde070 <test_large_struct>:
1cde070: 55 push %rbp
1cde071: 48 89 e5 mov %rsp,%rbp
1cde074: 48 89 f2 mov %rsi,%rdx
1cde077: 48 8d 05 82 1f 8b 01 lea 0x18b1f82(%rip),%rax # 3590000 <_GLOBAL_OFFSET_TABLE_>
1cde07e: 48 b9 c0 61 4e 00 00 movabs $0x4e61c0,%rcx
1cde085: 00 00 00
1cde088: 48 8b 04 08 mov (%rax,%rcx,1),%rax
1cde08c: 48 8d 75 10 lea 0x10(%rbp),%rsi
1cde090: 5d pop %rbp
1cde091: ff e0 jmpq *%rax
1cde093: 66 66 66 66 2e 0f 1f data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
1cde09a: 84 00 00 00 00 00
Does anyone have thoughts on this? I can’t tell if I have a stupid mistake somewhere, or if I’ve uncovered a bug in Julia or PackageCompiler.jl…