Julia + Notebook memory leak

Hi,

I’ve started trying out jupyterlab with IJulia to see if I could free myself from some large data loading processes that take a lot of time at every run.

The CSV I’m working on is decently large (2GB) which then takes around 6GB in RAM when converted to a DataFrame (should I use JuliaDB for such scale?). However, this space is not an issue on my system in itself, the real problem comes from the fact that I can’t seem to figure out how to free this memory once I’m done with it.

If I’m running my script normally, without a notebook, the memory is obviously freed once the program is done with it, however in an interactive environment the memory is kept… even if the kernel is stopped and closed! Looking for ways to clear the memory I found that replacing it with a value such as df = missing was supposedly working, but even doing that the RAM still stays up, which means then that I have 6GB of memory that are lost and no longer linked to the variable that was reassigned, so I’ve no way to access it anymore.

So the question is, is there a way to clear the memory in an interactive notebook? Should interactive notebooks be used at all in Julia? If I cannot use a notebook without memory leak, how could I load large data files without slowing down my iterative processes too much?

Thanks :slight_smile:

There was a discusison on this already here: How to release memory from jupyter notebook? - #5 by nilshg

You can try the empty!(Out) command, but that would only do anything if you have large outputs in your cells.

I’ve been experiencing similar issues and have never fully been able to deal with them, nor produce anything reliably reproducible and diagnoseable to file any issues (given that this is just something that crops up after a few hours of analysis on large data sets). One thing that has helped me is switching large string columns in my data to ShortStrings and PooledArrays, which generally speeds up all sorts of operations on DataFrames, but also seems to mitigate the slow memory creep you describe.

1 Like

I see, thank you for the topic reference I missed this one! I know that output size is not the issue here, it really is plain memory allocation on the dataframe, but I’ll try ShortStrings and PooledArrays to see if it can help my problem.