Tracing memory occupied by existing variables

Hi All,

We are searching for a cause, why our server eventually consumes all memory and crashes, a topic discussed here Poor performance of garbage collection in multi-threaded application
We have tried to execute GC manually, but it does not seem to help. The effect we observe is very non-deterministic and we do not really know, what is going on.

As a next step, we are considering to try to reach all allocated objects in all modules and compute their size, such that we know, that there is something we have not seen
I would like to ask, if a code-snippet below make sense?

function memtracer(m::Module, depth = 0, traced = Set())
	ns = filter(n -> getfield(m, n) isa Module, names(m))
	ns = filter(n -> n ∉ traced, ns)
	traced = union(traced, ns)
	for n in ns
		memtracer(getfield(m, n), depth, traced)
	end

	vars = filter(n -> !(getfield(m, n) isa Module), names(m))
	vars = filter(n -> !(getfield(m, n) isa Function), vars)
	vars = filter(n -> !(getfield(m, n) isa DataType), vars)
	vars = filter(n -> !(getfield(m, n) isa Type), vars)
	isempty(vars) && return()
	println(repeat(" ", depth), "Module: "*string(m))
	mems = map(n -> Base.summarysize(getfield(m, n)), vars)
	stats = sort(collect(zip(vars, mems)), lt = (i,j) -> i[2] < j[2], rev = true)
	l = maximum(length(string(v)) for v in vars) + depth
	for (n,s) in stats
		println(rpad(repeat(" ", depth+4)*string(n), l),"  ", s)
	end
	nothing
end

memtracer(Main)

Or, of someone else can provide us a different strategy to achieve this, using some nice mechanism we do not know about, we would be very happy.

Thanks to all in advance,
Tomas

1 Like

Are you by any chance reading large data from disk?

I think your approach quite similar to this that’s in Base:

InteractiveUtils.varinfo(Module, recursive = true, all = true, sortby = :size, minsize = 1000)

(minsize was added in 1.8)

https://docs.julialang.org/en/v1.8-dev/stdlib/InteractiveUtils/#InteractiveUtils.varinfo

I took inspiration there, and did not know that you can write it as this. Thanks a lot.

Yes, we do?
Is there something what can go wrong there?

I’ve seen this issue popping up from time to time over several years, always related to reading data from files. I reported it here

And investigations led to this issie

1 Like

Interesting, I will take a look on that. Thanks a lot.
Very valuable.

1 Like

Do you using that using Julia with musl instead of glibc would help?

Note that we occasionally have some troubles with musl dynamic loader: https://github.com/JuliaLang/julia/issues/40556. But there is a suggested workaround there (it involves recompiling the libc).