[Solved] Debugging an apparent memory leak

Hello all,

I have a problem on Julia v0.5 that I need some advice on how to debug. The essential problem is that I have a script that takes several hours to run. As the script is running, the memory usage of each worker slowly grows until the machine runs out of memory and my script is killed. Then I get sad because I wasted several hours.

Without reproducing all of the details, the essential structure of the code looks like:

function do_the_thing()
    @sync for worker in workers()
        @async begin
            input = RemoteChannel()
            output = RemoteChannel()
            remotecall(start_worker_loop, worker, input, output, big_matrix_I_only_want_to_move_once)
            while keep_going()
                my_input = get_next_input()
                put!(input, my_input)
                my_output = take!(output)
                save_to_file(my_output)
            end
        end
    end
end

function start_worker_loop(input, output, big_matrix)
    while true
        my_input = take!(input)
        my_output = long_computation(my_input, big_matrix)
        put!(output, my_output)
    end
end

function long_computation(my_input, big_matrix)
    # lots of math and allocations, but I would expect that all of the allocations I make here
    # will be garbage collected every time this function returns...
end

To reiterate the problem is that the memory usage seems to grow linearly with time (possibly related to the details of what is going on in the long_computation function), but I don’t even know where to start with debugging this. Any advice or tips would be greatly appreciated.

Thanks for your time!
Michael

Updating this thread now that I’ve figured out what the problem was. Hopefully this helps somebody coming from Google (hi!).

The problem is not at all related to the code structure I outlined in the previous post, but rather the details of the long_computation function are at fault.

Essentially I wrote a wrapper around a C++ library that does things like

type MyCxxType
    ptr :: Ptr{Void}
end

function MyCxxType()
    ptr = ccall( ... ) # new CxxType()
    output = MyCxxType(ptr)
    finalizer(output, delete)
    output
end

function delete(input::MyCxxType)
    ccall( ... ) # calls delete
end

The ccalls simply call new and delete for the corresponding type in C++.

I believe the problem here is similar to https://github.com/JuliaLang/julia/issues/11698

Essentially the Julia garbage collector doesn’t know how large MyCxxType really is (it thinks it’s just a pointer). Therefore it never feels any pressure to call the object’s finalizer and so nothing ever gets garbage collected. The solution is therefore to not do this. Write the wrapper in a way that you’re not storing Ptr{Void} as a field. It turns out that that was a mistake…

3 Likes