Memory limitation for large notebooks

I’ve been appreciating Pluto for its reactivity and consistency. But it’s occurring to me now that the reactive model may hit limits for large reports. I’m interested in using Pluto for large reports that may involve repeated computations against large datasets. I want each dataset’s memory to be freed once its section has been generated. Is this impossible to do under Pluto’s reactive model? Is Jupyter the only way to create such reports?

Not sure how your datasets map to variables, but you can use functions or let blocks to define local scopes. Variables that are local to a scope will be garbage collected as usual after they go out of scope. So, for example, you can write a function that loads the dataset, does all the processing (perhaps by calling other functions defined in other cells), and returns whatever processing results/figures you need, but does not return the variable holding the dataset.

2 Likes

Got it. I basically have to prevent large data objects from becoming a reactive variable. Limit reactivity to among small objects condensed from the dataset.

2 Likes

A bit hacky, but you can also wrap a dataset in a Ref or array and overwrite the object once the computation is done:

ds_cache = Any[load_dataset()]

let dataset = ds_cache[]
    ...
end

# more cells if needed

ds_cache[] = nothing

It will work if you need the loaded dataset to persist through multiple cells. The catch is, ds_cache[] = nothing is not guaranteed to run after the computation if you restart the notebook.

Those are some good tips!

Reactive variables in Pluto should also get garbage collected. E.g. a cell with hello = rand(Bool, 1_000_000) should allocate, and changing it to hello = [false, true] should lead to the old data (1MB) getting freed from memory.

But I can imagine that this is not always working properly, GC is a difficult topic. We don’t have testing for this right now, because I didn’t know how to write a test for this. It would really help if someone could:

  1. Find a clear, simple example showing GC not working properly for reactive variables.
  2. Write a reliable piece of Julia code that can detect if the object was freed from memory or not. (We could use this in our testing.)

If I understand correctly, this refers to when you replace the definition of a reactive variable by editing and rerunning a cell? You can’t have hello = rand(Bool, 1_000_000) in one cell and then hello = [false, true] in a later cell, can you?

I understood the OP as being concerned about memory use that remains even in an end-to-end run of a notebook in its “final” report form. In jupyter they’d reuse the same global variable in one cell after the other, for one dataset after the other, allowing the GC to collect old datasets when no longer referenced. With reactivity, you can’t reassign global (reactive) variables, hence the suggestion to only have datasets in local (non-reactive) variables.

In other words, I don’t think there’s any issue with the GC not working properly here.