Memory limitation for large notebooks

Kirby_Zhang · April 14, 2025, 11:00pm

I’ve been appreciating Pluto for its reactivity and consistency. But it’s occurring to me now that the reactive model may hit limits for large reports. I’m interested in using Pluto for large reports that may involve repeated computations against large datasets. I want each dataset’s memory to be freed once its section has been generated. Is this impossible to do under Pluto’s reactive model? Is Jupyter the only way to create such reports?

danielwe · April 14, 2025, 11:07pm

Not sure how your datasets map to variables, but you can use functions or let blocks to define local scopes. Variables that are local to a scope will be garbage collected as usual after they go out of scope. So, for example, you can write a function that loads the dataset, does all the processing (perhaps by calling other functions defined in other cells), and returns whatever processing results/figures you need, but does not return the variable holding the dataset.

Kirby_Zhang · April 14, 2025, 11:09pm

Got it. I basically have to prevent large data objects from becoming a reactive variable. Limit reactivity to among small objects condensed from the dataset.

Vasily_Pisarev · April 15, 2025, 1:26pm

A bit hacky, but you can also wrap a dataset in a Ref or array and overwrite the object once the computation is done:

ds_cache = Any[load_dataset()]

let dataset = ds_cache[]
    ...
end

# more cells if needed

ds_cache[] = nothing

It will work if you need the loaded dataset to persist through multiple cells. The catch is, ds_cache[] = nothing is not guaranteed to run after the computation if you restart the notebook.

fonsp · April 17, 2025, 8:12am

Those are some good tips!

Reactive variables in Pluto should also get garbage collected. E.g. a cell with hello = rand(Bool, 1_000_000) should allocate, and changing it to hello = [false, true] should lead to the old data (1MB) getting freed from memory.

But I can imagine that this is not always working properly, GC is a difficult topic. We don’t have testing for this right now, because I didn’t know how to write a test for this. It would really help if someone could:

Find a clear, simple example showing GC not working properly for reactive variables.
Write a reliable piece of Julia code that can detect if the object was freed from memory or not. (We could use this in our testing.)

danielwe · April 17, 2025, 5:04pm

If I understand correctly, this refers to when you replace the definition of a reactive variable by editing and rerunning a cell? You can’t have hello = rand(Bool, 1_000_000) in one cell and then hello = [false, true] in a later cell, can you?

I understood the OP as being concerned about memory use that remains even in an end-to-end run of a notebook in its “final” report form. In jupyter they’d reuse the same global variable in one cell after the other, for one dataset after the other, allowing the GC to collect old datasets when no longer referenced. With reactivity, you can’t reassign global (reactive) variables, hence the suggestion to only have datasets in local (non-reactive) variables.

In other words, I don’t think there’s any issue with the GC not working properly here.

Topic		Replies	Views
Does anyone find it is a bit annoying that Pluto.jl does not let you reuse variable in different cells? General Usage pluto	8	4124	December 26, 2020
"Pluto" memory usage Performance pluto	9	1805	December 29, 2020
Integrating a long(ish) running simulation into Pluto Pluto	7	693	October 26, 2023
My Pluto notebook is not Reactive New to Julia	4	528	April 17, 2023
How to delete a variable from memory in Julia? Performance memory , memory-allocation	12	1025	June 17, 2024

Memory limitation for large notebooks

Related topics