Don't run all cells on startup and only update cells that have been run at least once

Hello,

As far as I can tell, it’s not possible to not run the whole notebook on startup. Which is a shame.
I think it would be very beneficial to have simply a “Trust” button that exit safe mode without running the whole notebook.

Especially when paired with the feature of not updating cells that have never been run in the session. This is a very helpful workflow with big notebooks.

For example, I have a notebook that takes 15 minutes to run, of which 2-3 minutes of preprocessing steps, and the rest of the meat of the calculations.

  1. First pain point: running the whole notebook at the start forces me to wait 15 minutes before I can start modifying the earlier steps;
  2. Second related pain point: If the whole notebook ran once, and I need to tweak a step at the start of the pipeline I am stuck in a 15 minute feedback loop. (if interrupting worked in windows probably I wouldn’t be here, but oh well).

It would then be nice if we could manually run each cell and treat all the unrun cells as “disabled” until run the first time. So that the propagation stops and you can work on smaller branches of the topology.

Why not use Jupyter?

1 Like

I want to keep the deterministic execution order and auto update.
As well as having a plain julia file behind.

For me, Pluto starts in “Safe mode”, without executing any cells. From there, I can select “Disable cell” in the cell menu to not run them on startup. The disabled state seems to be persistent between Pluto restarts.

So, the option is there for me, just not as the default behavior.

2 Likes

The disable cell trick is fine as long as you have few cells. But I have 100+ cells, I don’t want to have to enable and disable them manually each time. as well as remembering which cells I enabled and which I disabled…

Disabling cells work on the same non-linear reactive dependency graph as reevaluations, in other words manually disabling one cell will automatically disable all dependent cells in its branch. Manually disabled cells will fade in a visually distinct way from automatically disabled cells. Manually enabling a cell will run it and automatically enable and run all dependent cells in its branch that haven’t been manually disabled.

If you want to manually disable one branch for edits before running it at once, then this is simple and I expect this to solve the 1st pain point and the first half of the 2nd pain point. If you’re manually disabling several branches, it’s hard to distinguish them. If you want to gradually evaluate a branch cell by cell, then you’d have to keep shifting manual disables and enables down the branch, and spotting the branching cells is hard. I asked about Jupyter earlier because evaluating cell by cell in a linear non-reactive notebook is simpler, with obvious tradeoffs.

As for the second half of the 2nd pain point, the >100 cells running for 15 minutes on its own doesn’t sound like a good fit for notebooks. It’d be ideal if that could be broken down into smaller notebooks or functions, but I have no idea if that’s appropriate. Just in case, try holding Ctrl and mashing Q and see if that pulls off a notebook interrupt.

2 Likes

Besides the disable-Feature (so disabling the “latest” cell in the dependency graph), I think your two feature ideas both have at least an UI problem – so besides that someone would have to sit down and implement them

What would be the alternative to running all? Indicating which have been run and which not? Then you would manually trigger a cell and it would check and run all cells it needs to run this cell? (If they were not manually run before)
That would require a good UI indicator as to which cells have already been run and which not and breaks with the persistent state Pluto has (which I love)

What would be the alternative to the second? Not running it would again break with Plutos persistent state design idea, that the whole notebook is always in the persistent state of all variables reflect values that correspond to the current state of the code.
You also wrote you want to have “auto update”, but that would indeed require to also update all cells later in a workflow in your remark 2, otherwise it is not auto update.

Thank you,

I was not aware of the propagating effect of the disable cell. Although in hindsight it should have been obvious since disabling a cell invalidates all descendants. I missed that manual section completely.

Fortunately it’s fine just to run the notebook in major blocks, no need for cell by cell increments so it should work well.

You are right that the notebook is not very suited for such big workflows, but it kind of grew into it and haven’t had the time to properly split it in smaller units. Although I must say, I am overall impressed at how Pluto handles it.

Sadly interrupting is completely off the table, at least on windows for what I can tell. Pluto spams fires emojis :fire: but nothing happens.
I am forced to start spamming Ctrl-C in the terminal but then that usually brings down the whole Julia session in some spectacular crashes.

This was a bit a frustration post (not entirely due to poor Pluto). Should have taken a walk instead. But in any case,

The alternative to “run all” could just be just “trust”. I initially proposed both ideas together since one without the other indeed do not make much sense.

The “not run yet” indicator could just be a simple icon. The idea for “not yet run cells” was just that they exist in the notebook but are simply not present in the cell graph.
There’s no change needed in how Pluto run cells, you simply hide them from it until you run them the first time.

This does not break the consistency, all dependent cells that see each other will be run in order. When you are first writing the notebook subsequent cells don’t exist yet, because you haven’t written them yet, but the notebook is still in a consistent state up to that point.

2 Likes

To me that sounds nearly like a button that could be “Trust notebook but deactivate all cells”? Then you could decide which ones to enable?
I think that could come quite close to that you had in mind…

And sure, I see the frustration when you have to drink so much coffee because there always is a 15 minute waiting time :wink:

3 Likes

I don’t really see omitting unran cells from the reactive graph or hard-disabling all cells as tenable ideas, let alone time-savers. A notebook starts as unrun, so none of the cells would be in the active graph. Adding cells to the graph isn’t straightforward:

  1. If we only add cells to the graph manually, dependent cells can’t be added automatically. None of us want to add every single cell manually, even in much smaller notebooks.
  2. Assuming the graph is one (directed) tree, maybe we want to manually add the root cell and automatically add its child cells. But that’s every cell, so we don’t save any time. It’s not possible to only add a subtree via a non-root cell because its parent cell has not run. A possible time-saver is if the graph is several unconnected trees, but that’s more of a justification for separate notebooks.
  3. The graphs aren’t actually trees, they are directed acyclic graphs (I think) that can have multiple roots. For a simple example, consider a notebook with cells labeled C,E,B,D,A (graph below), E being the expensive cell. Even before considering saving time, we have to manually enable the 3 (hopefully labeled) root cells A,B,C to run E; to adjust the rule in (2) accordingly, manually adding a cell would automatically add child cells whose parent cells are all enabled. To save time, we’d have to further adjust the rule to never automatically disable hard-disabled cells and manually hard-disable E before enabling B to trigger only D. At this point, we manually tweaked 4/5 cells just to avoid E. Starting with a full active graph (the status quo) actually saves us work because we skip right to disabling expensive cells and subgraphs without figuring out a sufficient combination of root cells for preprocessing.
C
|
-->E
   ^   B
   |   |
   D <--
   ^
   |
   A

If you need more interaction than disabling a 12-minute subgraph until parent cells are ready, you need to insert breaks into the long-running program to interact with its state and conditionally reverse progress, a workflow not suited for notebooks.

It’s possible to split smaller distinct steps across cells in Jupyter and rerun some state-changing cells to simulate a loop, but:

  1. Tweaking state in cells isn’t feasible for many changes and reverses.
  2. Although cells indicate manual execution order, it’s still easy for the program to be in an incorrect state that is hard to see in scattered and likely obsolete cells.
  3. Manual execution order of cells isn’t reproducible, so you need to refactor an updated program into another notebook or script anyway.

Pluto is even less suitable because of its design for reproducing one updated program state:

  1. We can’t assign the same variable in multiple cells because no objective or manual order exists for a reproducible state.
  2. Each cell only has one expression to discourage unnecessary dependencies and to visualize more program state. This contributes to the tendency for Pluto notebooks to be even shorter and quicker (ideally seconds at most) than Jupyter notebooks.

All that said, notebook limitations can be pushed in practice. People have shared hours-long Jupyter notebooks with me that weren’t worth rewriting, so “if it works, it works.”

1 Like