Passing large julia struct to c

Hello, I want to pass an extremely large struct to c interface. For example

mutable struct LargeStruct
    size::UInt32
    data::NTuple{1024*1024*4,UInt32}
end

And when I try to create such a large struct, stack overflow error occured. Is there any way to fix this. For example, the code below reports stack overflow

LargeStruct(1024*1024*4,Tuple(zeros(1024*1024*4)))

If I change the struct definition and reduce the Tuple size to be 1024. Then the code above(after changing tuple size to be 1024) works

Do you have an example that we can run, showing the stackoverflow? What error do you get?

Hello, I edited the question and listed the exmple code in the question

1 Like

Why would you need such a large struct? Storing such a large object on the stack (instead of heap) could result in stack overflow. This is not an issue with C at all. The problem started when you tried to have an extremely large struct in the first place.

I need a large struct to be passed to a c interface. This is required by the .so lib. I tried to create such struct on heap but didn’t find a way. I could also create a large vector with the same memory layout. But member accessing was not elegant.

I’d suggest you do that. Aka

struct MyWrapperStruct
size_and_data::Vector{UInt32} #has (1<<22) + 1 elements, first is the size
end

Then you can extend the various getindex / setindex! / getproperty / setproperty! / length / pointer methods in order to make access elegant (i.e. in the C world, the interface almost surely looks like foo(LargeStruct* ptr), and you need to then ccall with pointer(myWrapperStruct) which is implemented as Base.pointer(w::MyWrapperStruct) = pointer(w.size_and_data).

Depending on context, you might prefer Matrix{UInt32} with size(size_and_data) == ((1<<22) + 1, 1) – this tells julia that the thing cannot be resized.

In the new world, you may consider Memory{UInt32}.

Codegen hates large tuples.

I can reproduce the stackoverflow, seems to be due to the implicit convert happening:

julia> convert(NTuple{1024*1024*4, UInt32}, Tuple(zeros(1024*1024*4)))
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ REPL[7]:1

It’s a bit unclear what exactly is causing the stackoverflow there, so I’ve opened an issue for this particular error.

Note that this creates a tuple of Float64, not Int32. If I call it like so:

julia> a = LargeStruct(1024*1024*4,Tuple(zeros(UInt32, 1024*1024*4)));

julia> sizeof(a)
16777220

the object is created correctly, confirming that the stackoverflow is due to the implicit convert the default constructor calls on its arguments.

I do wonder though whether this struct actually matches the struct on the C side - do you happen to have the struct definition on the C side? As is, the struct ends up being just over 16MiB large, which seems awfully large to me.


For C interop, that won’t be desirable since Memory is not allocated inline in structs, unlike the NTuple.

1 Like

The point is that you end up with

struct WrapL
    size_and_data::Memory{UInt32}
end

function WrapL() 
 res = WrapL(Memory{UInt32}(undef, 1 + (1<<22)))
 res.size_and_data .= 0
 res
 end

 function Base.getproperty(w::WrapL, sym::Symbol)
    if sym == :size
        getfield(w, :size_and_data)[1]
    elseif sym == :data
        #do you instead want view(getfield(w, :size_and_data), 2:w.size+1)) ?
        view(getfield(w, :size_and_data), 2:(1+(1<<22)))
    else getfield(w, sym)
    end
end

function Base.setproperty!(w::WrapL, sym::Symbol, s)
    if sym == :size
        getfield(w, :size_and_data)[1] = s
    else Base.setfield!(w, sym, s)
    end
    end

In other words, you reproduce the desired C layout inside your Memory.

This is OK, because the C world will probably only ever deal with pointers to LargeStruct – 4MB on-stack buffers / value-types are just plain obnoxious. Take a look at e.g. https://devblogs.microsoft.com/oldnewthing/20220203-00/?p=106215 – and now think what a 4MB 16MB stack allocation that crosses the guard page does. Your compiler probably needs to walk down that array and touch every page on the way, in order to trigger a pagefault on the guard page before a pagefault beyond the guard page is triggered! And if your compiler doesn’t do that, boom.

That only works in special cases where the element type of the NTuple and the other fields of your struct are the same and there’s no padding between the data and other fields of the struct; for more complicated structs, that no longer works.

Of course it works, just do the necessary pointer arithmetic in your custom getproperty / setproperty! methods.

Ah yes, doing pointer arithmetic as if we’re writing C - might as well just stick to C in that case and not have to worry about mutability or other Julia semantics at all…

This is getting mightily off topic, so let’s end that line of discussion :slight_smile:

The question was explicitly about interop with a data layout coming from a C library – i.e. how to best write a compatibility layer.

And my suggestion is to do pointer arithmetic in the compat layer instead of trying to use the julia object model for this specific kind of C struct. The end result has the same kind of semantics and ergonomics, but the julia compiler kinda sucks with giant tuples.

C also sucks if you try to pass giant structs by-value – you need to pass by pointer. This should be equivalent to a mutable struct containing the giant tuple, but julia makes a lot of effort to unpack the thing.

Something that doesn’t work this way is if your struct contains garbage-collected julia object references intermingled with giant memory blobs. I.e.

mutable struct ThisIsVeryBad
lolIamGCControlled::Any
payload::NTuple{1<<22, UInt32}
end

I’ve deliberately not delved into any specific suggestions for how an interop layer/struct should be declared in Julia because we haven’t gotten enough information or even gotten an indication about whether that’s desired. @vincentliu asked about why they’re getting a stackoverflow, so that’s what I’m focusing on. As shown above, the stackoverflow is unrelated to any ccall. We just don’t know whether the example in the OP is a minimized version of some other code, just some exploratory coding or some other thing, so jumping in with declarative “write a custom getproperty and do pointer arithmetic” seems very premature and off topic to me.

1 Like

Fair enough :wink:

So the rough answer is: The julia compiler is very very bad at dealing with large immutable structs and large NTuple.

A mutable struct containing a large NTuple becomes problematic if you access its large NTuple member (or a large immutable struct member).

This is not a complete show-stopper, the correct safe way to access large NTuple members is via pointer arithmetic.

For example:

julia> struct LargeStruct
           size::UInt32
           data::NTuple{1024*1024*4,UInt32}
       end
julia> r=Ref{LargeStruct}();

julia> typeof(r)
Base.RefValue{LargeStruct}

julia> unsafe_load(convert(Ptr{UInt32}, pointer_from_objref(r)),1)
0x00000000

julia> r[];
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ REPL[7]:1

PS. For an example how it is done right, look at the implementation of StaticArrays / MVector.

julia> using StaticArrays
julia> mv = MVector{1<<22, UInt32}(undef);

julia> mv
4194304-element MVector{4194304, UInt32} with indices SOneTo(4194304):
 0x00000000
....
julia> mv.data
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1
julia> mv.data[17]
ERROR: StackOverflowError:

This works splendidly, because the problematic large immutable structs are never touched if you stick to the MVector interfaces.

The difference to C is:

In julia, an access like mv.data[17] conceptually first loads data, then takes the 17th element. At that point, you’re already fucked because “load data” explodes in your face.

In C, the corresponding thing would look like

typedef struct{ int data[1<<22];} BigThing;

int foo(BigThing* something){
return something->data[17];
}

which conceptually does pointer-arithmetic first and then loads the single data element. Cf godbolt clang17 with -O0 -S -emit-llvm:

%struct.BigThing = type { [4194304 x i32] }

define dso_local noundef i32 @foo(BigThing*)(ptr noundef %something) #0 !dbg !225 {
entry:
  %something.addr = alloca ptr, align 8
  store ptr %something, ptr %something.addr, align 8
  call void @llvm.dbg.declare(metadata ptr %something.addr, metadata !238, metadata !DIExpression()), !dbg !239
  %0 = load ptr, ptr %something.addr, align 8, !dbg !240
  %data = getelementptr inbounds %struct.BigThing, ptr %0, i32 0, i32 0, !dbg !241
  %arrayidx = getelementptr inbounds [4194304 x i32], ptr %data, i64 0, i64 17, !dbg !240
  %1 = load i32, ptr %arrayidx, align 4, !dbg !240
  ret i32 %1, !dbg !242
}

clang does not emit a naive load followed by access. It uses getelementptr to do pointer arithmetic first, then accesses the element. Even with -O0.

This is a julia conceptual limitation. (foo(m)=m.data[17]; doesn’t stack-overflow because it is optimized out. If type-inference fails or running in the REPL / interpreter, it does overflow.)

This does create the struct successfully. But when I access the struct it get stackoverflow easily. For example:

s = LargeStruct(1024*1024*4,Tuple(zeros(UInt32,1024*1024*4)))
s.data[1:end] = ones(UInt32,1024*1024*4)
println(s.data[end])

Thanks to all your guys. I think the only solution is to create the memory by using Vector or Memory{UInt32} and manipulate the data using pointer arithmetics.

I’m really curious what the original C declaration looks like for the struct and the API function.

1 Like

Actually, the original C declaration is the same as above. It does involve passing a large chunk of data. I think this is common in building a simulator. For example a wireless communication system.

What? No, nothing about Julia-the-language mandates that data is copied around here. The compiler isn’t required to copy immutables. This is a bug, plain and simple, hence the issue I linked above.

What are you talking about? The code infers/compiles perfectly fine, you can inspect this with Cthulhu all you want. There’s likely some stack canary being overwritten here by accident that shouldn’t be, leading to a perceived stackoverflow. Why are you making such a strong & definitive claim without supporting evidence?

Here, I can even print all of a without issue:

julia> a = LargeStruct(1024*1024*4,Tuple(zeros(UInt32, 1024*1024*4)));

julia> show(stdout, a)
LargeStruct(0x00400000, (0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 [...]

which proves that this is not at all a “conceptual issue with Julia”. I cut the pasted part short here so that there isn’t 16MiB of data in discourse, which would hit the post length limit.

1 Like

Yes, I already reported that as an issue on the issue tracker: