Does a Task preallocate reserved_stack::Int virtual bytes for its stack, or is it an upper limit for a dynamically resizing stack? Does it diverge from how stack size is managed for calls in the main thread? If it’s version-dependent or OS-dependent, I only need 1.12 on Windows but would like to hear the variation.
Hopefully someone who knows more (especially about Windows) can chime in, but for now I can at least share my understanding.
I’m pretty sure that the bytes are pre-allocated, and we don’t have a dynamically resizing stack (hence why you can have a stack overflow error). Your OS might deliver that memory in pages though, so even if you request more memory than you have, it might not be ‘active’ until touched (this is OS dependant though).
Pretty sure no. I don’t think there’s anything special about the ‘main’ task.
A dynamically resizing stack with an upper limit would also cause stack overflow errors, but I also share that intuition because I would assume a dynamically resizing stack would allow much a much more generous upper limit like Go’s goroutine stacks going from an initial adaptive lower limit >=2KB to an upper limit of 1GB.
While there’s no guarantees about it as far as I’m aware, julia programs are currently written assuming that stack pointers are ‘stable’ throughout the lifetime delineated by the function body.
E.g. when you pass a stack allocated struct that’s bigger than the address-size to a non-inlined function, it is passed as a stack pointer. This means that as far as I’m aware, the stack can’t be dynamically grown, otherwise those pointers could become invalid if the stack was re-allocated.
For example:
julia> code_llvm(Tuple{Int, Int}) do x, y
tup = (x, y)
@noinline sum(tup)
end
; Function Signature: var"#71"(Int64, Int64)
; @ REPL[33]:2 within `#71`
define i64 @"julia_#71_5785"(i64 signext %"x::Int64", i64 signext %"y::Int64") #0 {
top:
%"new::Tuple" = alloca [2 x i64], align 8
store i64 %"x::Int64", ptr %"new::Tuple", align 8
%0 = getelementptr inbounds i8, ptr %"new::Tuple", i64 8
store i64 %"y::Int64", ptr %0, align 8
; @ REPL[33]:3 within `#71`
%1 = call i64 @j_sum_5787(ptr nocapture nonnull readonly %"new::Tuple")
ret i64 %1
}
Notice that @j_sum_5787 is being called on an alloca’d pointer. If somewhere deep inside the callstack of @j_sum_5787, we did a bunch more stack allocations, and had to dynamically grow the stack, these pointers would become invalid.
For another example, MArray from StaticArrays.jl also makes library-level direct use of stack pointers, and it seems to work fine.
Do you understand what this comment means then? julia/base/io.jl at cf5f5ebff6dc579ffbb559e6a3b5ccda302307a8 · JuliaLang/julia · GitHub
My understanding was that Julia doesn’t have stable stack pointers.
You might find Increase default stack size limit on 64-bit systems · Pull Request #55185 · JuliaLang/julia and document the optional argument of the `Task` constructor · Issue #55005 · JuliaLang/julia · GitHub relavent.
No, I head no idea why that comment is there. I’d also worry that if it’s actually a correctness problem whether or not r is heap or stack allocated here, then I think this code might actually become problematic in julia 1.13 or 1.14.
Previously, a mutable struct like Ref would be heap allocated if it crossed a non-inlined function boundary, but we now have interproceedural escape analysis which would allow us to stack allocate that Ref even though there’s a @noinline there…
I don’t really know how anything is implemented, but IIRC this also involves register allocation and objects being split up and partially omitted. I’d hazard a guess that the Ref part is trying to keep each element in one piece for arbitrary AbstractArrays that may not even store elements; write(s::IO, A::StridedArray) instead gets to work relative to pointer(A).
Comment is essentially repeated at the definition too. Same comments for unsafe_read.
r = Ref{eltype(A)}()
for a in A
r[] = a
nb += @noinline unsafe_write(s, r, Core.sizeof(r)) # r must be heap-allocated
@noinline unsafe_write(s::IO, p::Ref{T}, n::Integer) where {T} =
unsafe_write(s, unsafe_convert(Ref{T}, p)::Ptr, n) # mark noinline to ensure ref is gc-rooted somewhere (by the caller)
I think there might be something tricky going on when the task yields during the unsafe_write. Maybe task switching can temporarily move the paused task’s stack somewhere else while it is not running, but I don’t know why that would be needed.
Yeah I have no idea how tasks are implemented. I don’t even know why a stack overflow doesn’t happen sooner when I try to reserve 1 byte for a task’s stack.
julia> f() = f(); f()
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
ERROR: StackOverflowError:
Stacktrace:
[1] f() (repeats 79984 times)
@ Main .\REPL[152]:1
julia> schedule(Task(f, 1)) # expected 0-1 calls
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
Task (failed) @0x0000019dd4d81000
StackOverflowError:
Stacktrace:
[1] f() (repeats 2464 times)
@ Main .\REPL[152]:1
Other than a floor of ~12800 bytes and each call being 48 bytes, the step changes in thresholds don’t make much sense to me:
| reserved_stack min | reserved_stack max | repeated times | Δ repeated times |
|---|---|---|---|
| 1 | 2^17 | 2464 | |
| 2^17+1 | 2^17+2^16 | 3829 | +1365 |
| 2^17+2^16+1 | 2^18 | 5195 | +1366 |
| 2^18+1 | 2^18+2^17 | 7925 | +2730 |
| 2^18+2^17+1 | 2^19 | 10656 | +2731 |
| 2^19+1 | 2^19+2^18 | 16117 | +5461 |
| 2^19+2^18+1 | 2^20 | 21579 | +5462 |
| 2^20+1 | 2^20+2^19+2^10 | 32523 | +10944 |
| 2^20+2^19+2^10+1 | 2^21 | 43424 | +10901 |
| 2^21+1 | 2^21+2^20 | 65269 | +21845 |
| 2^21+2^20+1 (or 0) | ? (2^34 slow) | 79984 | +14715 |
The last 3 rows’ stacks seems to be effectively 2MiB, 3MiB, 3.6736MiB. I have no idea what’s special about that besides confirm that the 4 → 8MB boost didn’t make it to Windows.
Ah, I think I see what’s going on. This seems like just a bad alternative to GC.@preserve. It seems to be operating under the assumption that if unsafe_write is not inlined, then p is heap allocated, and there won’t be an opportunity to free p from the heap until after unsafe_write finishes running.
I’m not an expert on this, but I think this should have just been a GC.@preserve, and it shouldn’t matter if p is on the stack or the heap (though if there’s a memory safety problem here, it’d be much more deterministic on the stack, and harder to track down if it occurs on the heap).
I’ve submitted a PR proposing to fix this, we’ll see what the experts have to say: Make `unsafe_read` and `unsafe_write` GC preserve `Ref` arguments · Pull Request #61901 · JuliaLang/julia
Either way, I don’t think this has anything to do with the questions in this thread.
Edit: Some additional context here after looking through the git history, the commit that introduced the concept of @noinlineing the function in order to gc-root the Ref is from Jan 2016: add unsafe_read(io, p::Ptr{UInt8}, nb::UInt) counterpart to unsafe_write · JuliaLang/julia@8b97743 · GitHub
which is more than a year and a half older than the commit which introduced (the ancestor of) GC.@preserve: Fancier GC preserve intrinsics · JuliaLang/julia@d68e42f · GitHub.
Another update on this: it turns out that there’s a undocumented command line flag JULIA_COPY_STACKS which breaks the assumption that stack-pointers can be passed between tasks, and the heap allocation is I guess there to protect against this.
The flag forces this behavior, but it can also happen without the flag if Julia tries to create a mmapped stack for a new task but the mmap fails. Some systems have low limits on address space and number of open mmaps.
Interesting, good to know!
This behaviour seems quite problematic to me. It means that whether or not a pointer is allowed to cross a task-boundary depends on whether or not the optimizer decides to stack-allocate the object, but that’s an implementation detail which is unstable…
I also think it’s kinda nuts that this isn’t documented anywhere as far as I can tell.