"Pluto" memory usage

I had a session of pluto open and I was doing some standard machine learning tasks, like:

  • loading a data frame with ~3000 entries
  • doing some clustering
  • training some models
  • evaluating some models
  • making plots to see what’s going on

I ended up with this memory situation:

Also, note that the firefox tab at the top is Pluto’s firefox tab. I checked by force quitting it and seeing which tab crashed in my browser and it was the pluto one.

The firefox tab + julia processes accounted for ~5GB of RAM which is most of my 8GB RAM space. As a result everything has slowed down … a lot. I type then stare at the computer for a long time waiting for something to happen. Execution also takes a long time because of the swap, I’m guessing. When I scroll up and down the page I have to wait for a while for Firefox to render something.

I tried to do some math and see which variables were holding onto that much memory… couldn’t make it add up to 2GB in the browser or the julia process. I got about 500MB of matrices, unsure how to count the plots.

How can I find out what’s taking up resources/reduce the memory usage/trigger a GC/work with pluto in a non memory consuming way?
It would be a cool feature for pluto to tell me what’s taking up memory so I can free it if I need to. Or I can mark something as ‘transient’ if I know it’s only being used in one place and I don’t want to hold onto that variable.

thanks in advance.

1 Like

A julia session that has nothing but:

using DataFrame
df = DataFrame(rand(3000,10))

uses about 1.7 Gb RAM. I’d imagine you must be using more than 1 package. So I would say what you’re seeing is reasonable. Since Pluto launches a separate Julia process anew, you would see about 4Gb of RAM pretty quickly.

This isn’t a lot of data so I’m not sure how much this helps here, but slowdowns in Pluto have been observed when many plots with lots of points are rendered. You could try changing your plot output to png format (plot(mydata, fmt = :png) ) to see if that helps.

I don’t see those results…

$ julia 
# memory usage is 97MB for the julia process
julia> using DataFrames
# memory usage increases to 192MB
julia> data = DataFrame(randn(3000, 10))
# memory usage is now 228MB

^ I repeated this a few times to mak sure; it was pretty stable

This result shows that the dataframe is taking about 36MB?

For interest, I tried doing:

$ julia
# memory usage is 90MB again
> data = randn(3000, 10)
# memory usage is 137MB

So the matrix takes about 40MB to store?

For comparison in python:

$ bpython
# memory usage 20MB
> import numpy as np
# memory usage 34 MB
> data = np.random.rand((3000, 10))
# memory usage STILL 34MB, the 240KB of space didn't register!

So 1.7GB isn’t showing up for me and I still can’t explain where all that memory is coming from.
I’ll try the plots next although I’m pretty sure I’m using png?

how are you reading off the memory, I’m looking at the VIRT in top

I like using the plotly backend to Plots.jl but the interactive plots use so much memory that I have to be careful about how many plots and how many points. This seems to be a JavaScript memory usage much more than Pluto or Julia. Looking at the page source can show that all of your data points might be rendered as strings and take a lot more memory than expected.

What if you put a ; at the end of the data = randn(3000, 10) line so not to show it? I think the vast majority of the memory usage you’re seeing comes from printing. Try also to run GC.gc() after allocating the array

1 Like

I’m reading the memory from Activity Monitor on my mac. Here’s the same thing done with top instead:

$ julia 
# memory usage is 92MB for the julia process
julia> using DataFrames
# memory usage increases to 189MB
julia> data = DataFrame(randn(3000, 10))
# memory usage is now 243MB

I checked between activity monitor and top and they report similar results - it’s not off by much.
That’s the console, again - so not able to explain what I’m doing to cause Pluto to take gigabytes of memory. I would have to be holding onto 40ish of these large dataframes to cause 2GB memory usage, and 80 of them for 4GB memory usage.

I’m going to investigate what others have said about suppressing printing and removing plots to see how much memory I take up. I guess the answer to this question is to do a few things in pluto at a time and see how much memory gets used until I find the answer.

thank you all for the help… i’ll report back when I know more.

Here’s the report of me tracking memory usage in Pluto:

$ julia
# 90MB or so 
> using Pluto 
# 175Mb or so
> Pluto.run()
#220MB or so

I opened my Pluto notebook in a text editor and commented everything out, then slowly added back in one cell at a time. I’m reporting memory after running each cell. Firefox memory disagrees strongly between Firefox’s about:performance and my Activity Monitor. I’m going to report Firefox’s memory because I am not 100% sure what’s going on with the FirefoxCP Web Content processes:

> opening pluto window
# memory is ~300MB on two julia processes and 30MB in firefox
> import Distances
# nothing changed, same memory usage
> define some distance related functions
# no change in memory (phew, that would have been strange)
> import CSV, DataFrames, Tables, Glob
# one julia process went from 300MB -> 380MB , firefox at 30MB
> import MLJ, Distributions, Clustering
# now have 1 julia process with 500MB, another with 300MB, and firefox still at 30MB (although 120MB in the activity monitor, I think)

Loading libraries accounts for 500MB of my 4GB in RAM that I was seeing. This seems like a lot!
Now to actually do something:

> load data from CSV files, 3000x50 ish  (takes about 20s)
# julia process 520MB, 310MB, firefox still at 30MB
> build dataframe, separate features/labels 
# julia 530MB, 310MB, firefox 30MB

Machine Learning related activities test:
> separate train/test data
# no change
> load classifier and make machine
# julia 575Mb, 310Mb, firefox  no change
> fit & predict
# juila 600MB, 310Mb, firefox no change (although activity monitor shows the tab process taking 140MB

So comparatively loading the initial libraries took lots of memory and the actual work i’m doing with the matrices comes out to less than 100MB.

Clustering Activities
> compute full 3000x3000 distance matrix
# juila 700MB, 300MB, firefox still 30MB although Activity Monitor reports Firefox taking 1.5GB overall and 140MB for the process I think Pluto's tab is on.
> build clusters with Clusters.hclust
# julia 710MB, 300MB firefox still 30MB but note caveat above
> computing Clustering.vmeasure
# no change, thank goodness

This didn’t bust anything either. The largest 500MB memory usage chunk still seems to be from loading libraries. The only other thing I’ve done is plotting, so my memory woes are probably that, let me just make sure:

> import StatsPlots, set default to png (takes almost 3m)
# juila #1 750MB + 350MB of compressed memory now, #2 310MB,  firefox says 41mb for the tab, activity monitor suggests 150MB, and Firefox itself is taking 1GB
>  StatsPlots.plot(clusts) (takes 50s)
# julia 950MB + 200MB compressed, #2 300MB, firefox says 41MB for the tab, although activity monitor is up to 200MB

Oof, I think I need an alternative to StatsPlots if it’s taking 400MB just to plot a dendrogram…

This is the current state:

The other plot was a distance matrix heatmap, let me do that too:

> import GR; GR.heatmap(distance_matrix) # for speed, 
# julia #1 up to 1.14GB + 431MB compressed, #2 300MB + 115MB compressed, Firefox still reports 33MB, activity monitor shows tab taking 175MB
> import WGLMakie (replace GR, so that library should be removed by Pluto) set format to png (takes 200s)
julia #1 up to 1.1GB + 800MB compressed, #2  330MB + 300MB compressed, Firefox tab 36.3MB, activity monitor 215MB
> WGLMakie.heatmap(distance_matrix)
julia #1 up to 1.64GB + 175MB compressed, #2 about the same, Firefox stats same as above

I start to see Firefox say: “A webpage is slowing down your browser, what would you like to do?”
Firefox stubbornly claims the tab is taking only 40MB still which I think is wrong given what system is saying.

Then I try replacing WGLMakie with GLMakie, and julia’s memory jumps to 2GB - I can see how just evaluating more plots will continue to pile up on julia’s memory usage.

As a test, run GC.gc(). After:

So the conclusion is that 500MB is coming from dependencies, 200MB is from my matrices and dataframes, and the rest (~1GB) is all the plotting infrastructure - loading plotting and using the routines.

Plots/StatsPlots are very slow for my use case (large distance matrices esp, although large dendrograms too) - I’m trying to use GR or Makie, whichever is faster. In Pluto GR is fairly fast and produces an SVG blob. Makie is very slow (at least GLMakie and WGLMakie are)

I can’t see an option for png on either?