Fair enough
So the rough answer is: The julia compiler is very very bad at dealing with large immutable structs and large NTuple.
A mutable struct containing a large NTuple becomes problematic if you access its large NTuple member (or a large immutable struct member).
This is not a complete show-stopper, the correct safe way to access large NTuple members is via pointer arithmetic.
For example:
julia> struct LargeStruct
size::UInt32
data::NTuple{1024*1024*4,UInt32}
end
julia> r=Ref{LargeStruct}();
julia> typeof(r)
Base.RefValue{LargeStruct}
julia> unsafe_load(convert(Ptr{UInt32}, pointer_from_objref(r)),1)
0x00000000
julia> r[];
ERROR: StackOverflowError:
Stacktrace:
[1] top-level scope
@ REPL[7]:1
PS. For an example how it is done right, look at the implementation of StaticArrays / MVector.
julia> using StaticArrays
julia> mv = MVector{1<<22, UInt32}(undef);
julia> mv
4194304-element MVector{4194304, UInt32} with indices SOneTo(4194304):
0x00000000
....
julia> mv.data
ERROR: StackOverflowError:
Stacktrace:
[1] top-level scope
@ REPL[5]:1
julia> mv.data[17]
ERROR: StackOverflowError:
This works splendidly, because the problematic large immutable structs are never touched if you stick to the MVector interfaces.
The difference to C is:
In julia, an access like mv.data[17]
conceptually first loads data
, then takes the 17th element. At that point, you’re already fucked because “load data” explodes in your face.
In C, the corresponding thing would look like
typedef struct{ int data[1<<22];} BigThing;
int foo(BigThing* something){
return something->data[17];
}
which conceptually does pointer-arithmetic first and then loads the single data element. Cf godbolt clang17 with -O0 -S -emit-llvm:
%struct.BigThing = type { [4194304 x i32] }
define dso_local noundef i32 @foo(BigThing*)(ptr noundef %something) #0 !dbg !225 {
entry:
%something.addr = alloca ptr, align 8
store ptr %something, ptr %something.addr, align 8
call void @llvm.dbg.declare(metadata ptr %something.addr, metadata !238, metadata !DIExpression()), !dbg !239
%0 = load ptr, ptr %something.addr, align 8, !dbg !240
%data = getelementptr inbounds %struct.BigThing, ptr %0, i32 0, i32 0, !dbg !241
%arrayidx = getelementptr inbounds [4194304 x i32], ptr %data, i64 0, i64 17, !dbg !240
%1 = load i32, ptr %arrayidx, align 4, !dbg !240
ret i32 %1, !dbg !242
}
clang does not emit a naive load followed by access. It uses getelementptr to do pointer arithmetic first, then accesses the element. Even with -O0
.
This is a julia conceptual limitation. (foo(m)=m.data[17];
doesn’t stack-overflow because it is optimized out. If type-inference fails or running in the REPL / interpreter, it does overflow.)