Fair enough 
So the rough answer is: The julia compiler is very very bad at dealing with large immutable structs and large NTuple.
A mutable struct containing a large NTuple becomes problematic if you access its large NTuple member (or a large immutable struct member).
This is not a complete show-stopper, the correct safe way to access large NTuple members is via pointer arithmetic.
For example:
julia> struct LargeStruct
           size::UInt32
           data::NTuple{1024*1024*4,UInt32}
       end
julia> r=Ref{LargeStruct}();
julia> typeof(r)
Base.RefValue{LargeStruct}
julia> unsafe_load(convert(Ptr{UInt32}, pointer_from_objref(r)),1)
0x00000000
julia> r[];
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ REPL[7]:1
PS. For an example how it is done right, look at the implementation of StaticArrays / MVector.
julia> using StaticArrays
julia> mv = MVector{1<<22, UInt32}(undef);
julia> mv
4194304-element MVector{4194304, UInt32} with indices SOneTo(4194304):
 0x00000000
....
julia> mv.data
ERROR: StackOverflowError:
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1
julia> mv.data[17]
ERROR: StackOverflowError:
This works splendidly, because the problematic large immutable structs are never touched if you stick to the MVector interfaces.
The difference to C is:
In julia, an access like mv.data[17] conceptually first loads data, then takes the 17th element. At that point, you’re already fucked because “load data” explodes in your face.
In C, the corresponding thing would look like
typedef struct{ int data[1<<22];} BigThing;
int foo(BigThing* something){
return something->data[17];
}
which conceptually does pointer-arithmetic first and then loads the single data element. Cf godbolt clang17 with -O0 -S -emit-llvm:
%struct.BigThing = type { [4194304 x i32] }
define dso_local noundef i32 @foo(BigThing*)(ptr noundef %something) #0 !dbg !225 {
entry:
  %something.addr = alloca ptr, align 8
  store ptr %something, ptr %something.addr, align 8
  call void @llvm.dbg.declare(metadata ptr %something.addr, metadata !238, metadata !DIExpression()), !dbg !239
  %0 = load ptr, ptr %something.addr, align 8, !dbg !240
  %data = getelementptr inbounds %struct.BigThing, ptr %0, i32 0, i32 0, !dbg !241
  %arrayidx = getelementptr inbounds [4194304 x i32], ptr %data, i64 0, i64 17, !dbg !240
  %1 = load i32, ptr %arrayidx, align 4, !dbg !240
  ret i32 %1, !dbg !242
}
clang does not emit a naive load followed by access. It uses getelementptr to do pointer arithmetic first, then accesses the element. Even with -O0.
This is a julia conceptual limitation. (foo(m)=m.data[17]; doesn’t stack-overflow because it is optimized out. If type-inference fails or running in the REPL / interpreter, it does overflow.)