Hi everyone! I’m Rainer. I’m preparing a GSoC proposal for the Out-of-Core Computing project for Dagger.jl.
Coming from a background in mechatronics and embedded systems, dealing with strict hardware memory constraints and optimizing low-level routines (I’ve been working a bit with CUDA.jl recently) is a space I really enjoy, so this project caught my eye immediately.
Before I write up a massive architectural document, I want to get my hands dirty and prove out the core mechanics. My plan for the next few days is to:
- Write a minimum reproducible example (MRE) that intentionally forces an Out-Of-Memory (OOM) crash in Dagger by mapping over massive arrays.
- Draft a highly simplified local prototype to intercept the
Chunklifecycle and “spill/fetch” data to a local NVMe drive using serialization when a hardcoded memory limit is hit.
I was thinking of opening a GitHub issue to document the baseline OOM script and track my progress on the local-disk prototype.
Does this sound like a productive first step to the maintainers? Also, are there specific files in the scheduler’s source code you’d recommend I focus on first to understand how memory pressure is currently tracked per worker?
Thanks! Rainer