More pressure on garbage collector

Hello,

is it possible to manually increase pressure on Garbage Collector? I have several workers whose job is to download large files from S3, load them, and preprocess such that the master worker does not need to deal with this. But Julia crashes because all the memory is consumed, but this takes quite some time. I suspect that this is because garbage collector is not executed in time due to distributed environment. Can I somehow verify this idea?

Thanks for answers,
Tomas

You can use gc() to force the garbage collection, however, I don’t think that this is the root of your problem. Have you benchmarked your scripts to check the memory usage and peaks?

help?> gc
search: gc gcd gcdx gc_enable eigvecs eigfact eigfact! logspace getsockname

  gc()

  Perform garbage collection. This should not generally be used.

I do not really know, how to construct such a benchmark.

@time would be a good start. Since the job is time-consuming, I would not recommend something more systematic like

Thanks, I will resort to memory hunting.
Is it possible to see the size of the object?

sizeof should be doing the job for you. However, if you are using a custom datatype, you might need to overload Base.sizeof for your types, since normal sizeof will report wrong values in the sense that they will give you the pointer sizes for your architecture if you are using fields such as A::Matrix{Float64}, etc. See below for the example:

julia> A = randn(5,5);

julia> sizeof(A) # 5x5x8 bytes for Float64
200

julia> struct MyType
         A::Matrix{Float64}
       end

julia> obj = MyType(A);

julia> sizeof(obj) # 8 bytes on 64-bit architectures (i.e., `A` is simply a pointer)
8

julia> import Base.sizeof

julia> function sizeof(obj::MyType)
         res = 0
         for field in fieldnames(obj)
           res += sizeof(getfield(obj, field))
         end
         return res
       end
sizeof (generic function with 8 methods)

julia> sizeof(obj) # now we have correct value
200

See Base.summarysize.

Thanks a lot for help. This seems to be nicely clear to me.

@Tomas_Pevny, would you kindly update your findings about the memory leak? I find I have similar issue related to reading files many times.

On the end, I have found two bugs in my code:

  1. I was repeatedly using eval to introduce functions, which slightly bloated Julia’s dictionary of functions
  2. I was not closing stream from Transcoding streams.

Finding these two bugs was quite tedious, as I was cutting my code in halves to identify source of the problem.

Thanks very much for sharing.