The allocation profiler may be useful here to visualise the memory footprint of your application during runtime. There’s a good Julia Con talk - Hunting down allocations with Julia 1.8's Allocation Profiler | JuliaCon 2022
I don’t think this will be exactly what you want, and I’m interested to see what others suggest.
As a quick question, if you are using a single node, is it possible to use multithreading instead of MPI, as I think Trixi supports that? I suspect this will have a much lower memory footprint, and less likely to run out of memory. I wouldn’t be surprised if each process uses at least 500-1000MB depending on the size of the packages loaded, this isn’t uncommon.