Hello,
is it possible to manually increase pressure on Garbage Collector? I have several workers whose job is to download large files from S3, load them, and preprocess such that the master worker does not need to deal with this. But Julia crashes because all the memory is consumed, but this takes quite some time. I suspect that this is because garbage collector is not executed in time due to distributed environment. Can I somehow verify this idea?
Thanks for answers,
Tomas
You can use gc()
to force the garbage collection, however, I don’t think that this is the root of your problem. Have you benchmarked your scripts to check the memory usage and peaks?
help?> gc
search: gc gcd gcdx gc_enable eigvecs eigfact eigfact! logspace getsockname
gc()
Perform garbage collection. This should not generally be used.
I do not really know, how to construct such a benchmark.
@time
would be a good start. Since the job is time-consuming, I would not recommend something more systematic like
Thanks, I will resort to memory hunting.
Is it possible to see the size of the object?
sizeof
should be doing the job for you. However, if you are using a custom datatype, you might need to overload Base.sizeof
for your types, since normal sizeof
will report wrong values in the sense that they will give you the pointer sizes for your architecture if you are using fields such as A::Matrix{Float64}
, etc. See below for the example:
julia> A = randn(5,5);
julia> sizeof(A) # 5x5x8 bytes for Float64
200
julia> struct MyType
A::Matrix{Float64}
end
julia> obj = MyType(A);
julia> sizeof(obj) # 8 bytes on 64-bit architectures (i.e., `A` is simply a pointer)
8
julia> import Base.sizeof
julia> function sizeof(obj::MyType)
res = 0
for field in fieldnames(obj)
res += sizeof(getfield(obj, field))
end
return res
end
sizeof (generic function with 8 methods)
julia> sizeof(obj) # now we have correct value
200
Thanks a lot for help. This seems to be nicely clear to me.
@Tomas_Pevny, would you kindly update your findings about the memory leak? I find I have similar issue related to reading files many times.
On the end, I have found two bugs in my code:
- I was repeatedly using eval to introduce functions, which slightly bloated Julia’s dictionary of functions
- I was not closing stream from Transcoding streams.
Finding these two bugs was quite tedious, as I was cutting my code in halves to identify source of the problem.
Thanks very much for sharing.