Running pluto notebooks sequentially

I have several pluto notebooks as part of a larger data-processing pipeline.
‘01_QualityControl.jl’
‘02_BatchCorrection.jl’
etc.

I love it! Highly reccomended! However, now I want to write a script to run them all sequentially to make sure that the output is reproducible.

In principle, I guess they are just julia scripts, but in practice, they have their environments all nicely managed and evaluation order managed. What is the fastest way to get to a command-line solution that looks like

‘julia all_pipeline_notebooks.jl’ ?

This is not an issue. Pluto.jl takes care and saves the cells in evaluation order so you can actually just run the file.

However I am not sure about the “managed environment” part… I think you’d anyways want to create a dedicated environment for each notebook to take full advantage of precompilation and to have better control over the packages’ versions in a production setting.

1 Like

I think you’d anyways want to create a dedicated environment for each notebook to take full advantage of precompilation

Seems like there’s precompilation already in the notebook environment. Everything is snappy and fun on the second run, even with heavy dependency! 1.10+ julia and the latest pluto’s are truly amazing.

As advertised, the TOML is saved right there in the notebook text, so I guess I should find the TOML, save as a .toml and then activate that environment and then run the script? (only guessing, so let me know if this sounds correct)

Then, Id further guess/hope this environment should already be precompiled from running the notebook the first time? It seems like the possiibility of this direct conversion is implicitly part of the design of pluto but the procedure actually converting to a script while maintaining environment is not explicit in the docs.

In GitHub - j-fu/ExampleJuggler.jl: Help to maintain Julia code examples for Documenter and CI I provide the possibility to run the notebooks as files in tests (and in the Test environment). In order to prevent interference, they become wrapped into modules before execution.

1 Like

Well sounds correct to me but there could be an easier way that we are missing. I am confident that this approach should work correctly and it also could be automated rather easily.

You could use Pluto.activate_notebook_environment (see the last section of 📦 Packages: advanced — Pluto.jl)

4 Likes

I tried

pipeline = [
    "01_QualityControl.jl",
    "02_BatchCorrections.jl"
    # ...
]


for file in pipeline
    println("running $file ")
    Pluto.activate_notebook_environment(joinpath(@__DIR__,file))
    include(joinpath(@__DIR__,file))
    Pkg.activate()
    println("finished $file ")
end

This almost works! except the variables do interact with each other.

The using from one notebook to the next, maybe with slightly different package versions seem to introduce collisions / undefined errors. So I think I need to combine with @j-fu 's idea to wrap them in modules. Do I need a macro to do this or can I just modify the script above? I guess I can unroll the loop

I run the notebooks as scripts. This means that it is necessary to do this in an evironment which has all necessary packages.

Wrapping into a module can go like this:

This can be run via eval (see L11 above).

1 Like

You could use Module() to generate an anonymous module and Base.include to evaluate the file in that module instead.

function wrap_include(path)
    m = Module()
    Base.include(m, path)
    return m
end
3 Likes

I still havent gotten things working right, though the pluto scripts finish cleanly. In an attempt to isolate the problem I have unrolled the loop

import Pluto
import Pkg

function wrap_include(path)
    println("evaluating $path")
    m = Module()
    Base.include(m, path)
    println("finished $path")
    return m
end


pipeline = [
    "01_QualityControl.jl", #1
    "02_BatchCorrections.jl" 
   #...
]

Pluto.activate_notebook_environment(joinpath(@__DIR__, pipeline[1]));
wrap_include(joinpath(@__DIR__, pipeline[1]))

Pluto.activate_notebook_environment(joinpath(@__DIR__, pipeline[2]));
wrap_include(joinpath(@__DIR__, pipeline[2]))

However, I am still encountering some errors, mostly relating to performance jiujitsu inside Makie and GraphMakie

 - evaluating /Users/colinhl/Desktop/MCL23/PRIMAcyte/03_CelltypeAnnotation.jl
ERROR: LoadError: DimensionMismatch: No precise constructor for GeometryBasics.Pointf found. Length of input was 2.
Stacktrace:
   [1] _no_precise_size(SA::Type, x::Tuple{Float64, Float64})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:169
  [2] construct_type(::Type{GeometryBasics.Pointf}, x::StaticArrays.Args{Tuple{Tuple{Tuple{Float64, Float64}}}})
    @ StaticArrays ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:89
  [3] StaticArray (repeats 2 times)
    @ ~/.julia/packages/StaticArrays/MSJcA/src/convert.jl:173 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GeometryBasics/ebXl0/src/fixed_arrays.jl:0 [inlined]
  [5] (GeometryBasics.Pointf)(x::GeometryBasics.Point{2, Float64})
    @ GeometryBasics ~/.julia/packages/GeometryBasics/ebXl0/src/fixed_arrays.jl:76
  [6] (::GraphMakie.var"#20#47")(p::GeometryBasics.Point{2, Float64})
    @ GraphMakie ./none:0
  [7] iterate
    @ ./generator.jl:47 [inlined]
  [8] collect
    @ ./array.jl:834 [inlined]
  [9] #19
    @ ~/.julia/packages/GraphMakie/JuRfL/src/recipes.jl:220 [inlined]
 [10] (::GraphMakie.var"#19#46")(arg1#226::Graphs.SimpleGraphs.SimpleDiGraph{…}, arg2#227::NetworkLayout.Buchheim{…})
    @ GraphMakie ./none:0
 [11] #map#13
    @ ~/.julia/packages/Observables/YdEbO/src/Observables.jl:570 [inlined]
 [12] map(f::GraphMakie.var"#19#46", arg1::Observables.Observable{…}, args::Observables.Observable{…})
    @ Observables ~/.julia/packages/Observables/YdEbO/src/Observables.jl:568
 [13] plot!(gp::MakieCore.Combined{GraphMakie.graphplot, Tuple{Graphs.SimpleGraphs.SimpleDiGraph{Int64}}})
    @ GraphMakie ~/.julia/packages/GraphMakie/JuRfL/src/recipes.jl:213
 [14] plot!(scene::Makie.Scene, P::Type{…}, attributes::MakieCore.Attributes, input::Tuple{…}, args::Observables.Observable{…})
    @ Makie ~/.julia/packages/Makie/RgxaV/src/interfaces.jl:398
 [15] plot!(scene::Makie.Scene, P::Type{…}, attributes::MakieCore.Attributes, args::Graphs.SimpleGraphs.SimpleDiGraph{…}; kw_attributes::@Kwargs{})
    @ Makie ~/.julia/packages/Makie/RgxaV/src/interfaces.jl:310
 [16] plot!(scene::Makie.Scene, P::Type{…}, attributes::MakieCore.Attributes, args::Graphs.SimpleGraphs.SimpleDiGraph{…})
    @ Makie ~/.julia/packages/Makie/RgxaV/src/interfaces.jl:275
 [17] get_axis(fig::Any, P::Any, axis_kw::Dict, plot_attr::Any, plot_args::Any)
    @ Makie ~/.julia/packages/Makie/RgxaV/src/figureplotting.jl:46
 [18] plot(P::Type{…}, args::Graphs.SimpleGraphs.SimpleDiGraph{…}; axis::@NamedTuple{}, figure::@NamedTuple{}, kw_attributes::@Kwargs{…})
    @ Makie ~/.julia/packages/Makie/RgxaV/src/figureplotting.jl:65
 [19] graphplot(args::Graphs.SimpleGraphs.SimpleDiGraph{…}; attributes::@Kwargs{…})
    @ GraphMakie ~/.julia/packages/MakieCore/tAY2U/src/recipes.jl:35
 [20] treeplot(cluster_tree::Dict{Int64, Main.anonymous.CellType})
    @ Main.anonymous ~/Desktop/MCL23/PRIMAcyte/03_CelltypeAnnotation.jl:379
 [21] top-level scope
    @ ~/Desktop/MCL23/PRIMAcyte/03_CelltypeAnnotation.jl:559

Not sure what is changes going from pluto to a bare script but getting some extension loading warnings (not fatal) as well which arent there in Pluto. Seem to be some subtle changes, even inside the anonymous module as suggested by @savq :sweat_smile.

Please let me know if these kinds of effects are recognizeable from some known environmental change or eval related modification in Pluto that I can further mimic in my script.

1 Like

Which Makie do you use ?

1 Like

The error actually is thrown from a GraphMakie call. I think maintenance there has been a little more sporadic than mainstream Makie. This is on Julia 1.10.5.

[13f3f980] CairoMakie v0.12.13
[1ecd5474] GraphMakie v0.5.12

Thanks for taking a look! I will look through Pluto docs again this morning.

The issue is more subtle than I thought and has been hard to isolate: notebook 2 and notebook 3 break only after running notebook 1. I cant prevent errors even with the fresh environment and the enclosing module. But if I drop notebook 1 things work. :see_no_evil: Does not seem to be a namespace issue but rather a compilation issue.

here is status of notebook1:

  [336ed68f] CSV v0.10.14
⌅ [13f3f980] CairoMakie v0.10.12
⌅ [a93c6f00] DataFrames v1.6.1
⌅ [a09fc81d] ImageCore v0.9.4
  [916415d5] Images v0.26.1
  [2ab3a3ac] LogExpFunctions v0.3.28
  [b8a86587] NearestNeighbors v0.4.20
  [7f904dfe] PlutoUI v0.7.60
  [2913bbd2] StatsBase v0.34.3
⌅ [731e570b] TiffImages v0.8.0

status of notebook2

  [336ed68f] CSV v0.10.14
  [13f3f980] CairoMakie v0.12.14
  [a93c6f00] DataFrames v1.7.0
  [31c24e10] Distributions v0.25.112
  [a09fc81d] ImageCore v0.10.2
  [033835bb] JLD2 v0.5.5
  [2ab3a3ac] LogExpFunctions v0.3.28
  [6f286f6a] MultivariateStats v0.10.3
  [7f904dfe] PlutoUI v0.7.60
  [6f49c342] RCall v0.14.6
  [2913bbd2] StatsBase v0.34.3
  [731e570b] TiffImages v0.10.2
  [37e2e46d] LinearAlgebra

so there are updates being blocked, but they should be reolved with env change right?

If you want to run all notebooks in the same Julia process you kinda need to have a single environment with all dependencies or it will always be brittle (consider that Julia doesn’t support multiple versions of the same package).

Another workaround could be to use Distributed.jl to run the notebooks in separate processes. Then each process loads the appropriate environment and runs the notebook.

1 Like

I see, I should have used the Pkg.activate(common_project_env) structure laid out in the Pluto docs. Thanks for the help!

So! The kind of in-process environment multiplexing I would like is not currently possible, but couldnt the package namespace be unambiguously resolved by the currently activated environment? Basically, is this mid-script multiplexing incompatible in principle with Julia/Pkg’s current design, or is it just an undertested corner case?

Well not if the environments contain different versions for the same package. Like e.g. CairoMakie in the two environments you posted above. This currently is a fundamental limitation of Julia’s compilation process. I think there are ways to fix to fix this but I believe it is not currently a priority for the developers.
I don’t have a clear source on this handy. E.g. the Pkg.jl docs confirm this rught at the bottom of the page. Here is an older discourse thread about this question as well.

1 Like