Notebooks Need Modules, i.e. Multiple Separate Global Namespaces

For context, a typical workflow for me, using a Pluto or Jupyter notebook, looks like this

  1. Read in some data / create some initial data
  2. Do some processing
  3. Plot some intermediate results
  4. Do some more processing
  5. Plot final results
  6. Do processing again in a different way (different data, different method, etc.)
  7. Plot intermediate results again
  8. Do some more processing
  9. Plot final results for second version
  10. Compare both versions (often by plotting)

So what’s the problem?

First, I have multiple plots and because plots invariably require variables like an ax = Axis(...) or fig = Figure(...). I either have to give each axis or figure variable a new name like ax_for_intermediate_results_version1 or I have to wrap the whole plotting expression in a let ... end block, which is my current solution. This solution, however, doesn’t allow for any kind of interactive plotting. The whole plot has to be defined in one block. Adding elements piece by piece is not possible. That’s sad because, for me, the whole point of a notebook is to have a more interactive coding experience, which includes the plotting.

Second, since notebooks are great for experimentation, I often implement things in a few different ways and compare the results. Or I process different data using similar methods. In any case, I often have repeated code. And again, I run into issues with namespaces. When I write the second implementation I have to make sure I change every foo and bar to foo2 and bar2. Pluto helps quite a bit here because it won’t let you assign to the same variable twice. But still, this process of appending some kind of number or letter sequence to every variable is the cause of a great number of bugs in my notebooks and is quite tedious. You might argue that I should just write more reusable code. Put everything in functions and you’ve solve the problem, right? Yes, you have, but if you can do that, why even use a notebook? The thing that makes notebooks work, is that you have a lot of stuff in global scope that you can freely play with and inspect. Reusable code is necessarily less interactive.

What’s my proposed solution?

Implement a chapter/module feature using modules. Allow me to put multiple cells into one module. This way, both versions of my implementation get their own global scope and everything remains interactive because everything is still global and I can easily use code from other modules (or chapters) via the import/using syntax.

Every plot can live in its own little module where I can freely create the plot across multiple cells. I avoid naming conflicts and I have even better control of what I do and don’t import from the surrounding scope, than I would with a let ... end block.

5 Likes

I just wanted to add that I, too, have had a similar experience. I think your idea makes sense and I’d like to see something like this implemented as well, in order to simplify and streamline the kind of exploratory analyses that you outline.

3 Likes

I would love to see this as well. Is there an issue on Pluto.jl GitHub repo regarding this? Otherwise I can create it referencing this post. I have this issue constantly too because when analysing my lab data I have multiple datasets that I’m working with in different parts and currently I just name them p1_data, p2_data, ...

Would love to wrap each section in a namespace like a module or something

2 Likes

I haven’t opened an issue and last time I checked there wasn’t one. Feel free to open one.

can’t you use modules and local variables? they allow you to namespace and or retrict variable names to scopes (cells):

Yes but what if I want to add more variables to the module Foo later? Realistically, say I have a module named FreqDepNoise, which contains fields TotalVoltage and I want to pass TotalVoltage through a function remove_gain() and then store the result in the FreqDepNoise namespace. I can’t do that. All my code would have to be in one cell, that is importing data, processing it, plotting results, calculating error, etc. All this needs to happen in seperate cells which modules can’t do in a Pluto notebook.

1 Like

Hello, thank you! I’ve raised this issue

Sincerely, I deal with this problem the following way:

changed_global_var1, ... = let unchanged_global_var = deepcopy(unchanged_global_var)
    # code can make changes to unchanged_global_var here that this will
    # not affect the global binding
    values_of_new_or_changed_global_vars 
end
1 Like

That doesn’t help me because I still cannot have two variables named voltage. I’ll give a real example that I have. For my lab course, we were studying the dependence of thermal noise on frequency bandwidth, temperature and resistance. So the experiment has three parts. I have three datasets, one with a varying bandwidth, one with varying temperature, and one with varying resistance, but all measuring some sort of amplified noise voltage. I want to analyse all of these in the same notebook (which includes removing the noise from the instruments to isolate the noise from the resistor, removing gain, plotting results, calculating Boltzmann’s constant). Each of the parts require different kind of processing. So i import the data from an XLSX file, however I have to come up with new names for each of the measured voltages, and names for the gain removed voltages and the original voltages since I need both values while doing the data analysis. So my namespace ends up very cluttered with haphazard names and I end up namespacing mostly with an underscore as a delimiter which is a bit awkward. So I’m not necessarily changing a global variable voltage, I have multiple variables named voltage for these three different parts, and I have to have multiple steps of processing done on each of these voltage variables, and I need the intermediate variables as well to plot. This issue would be solved if I was able to create three separate namespaces.

I hope I was able to explain, I apologise if this is still difficult to understand.

1 Like

I have to admit that I am not entirely sure if I was able to understand the core of your problem.

If you need three variables related to voltage available in all three parts, then create three global variables named voltage_something where something describes exactly what you are storing.

If these three variables appear in each of three parts, but have distinct values/meanings in each of these, then encapsulate each part inside a let ... end block and have the three variables local to each block, so you can use the same names inside each block, but they have no relation to the variables of the same name in other blocks.

Can you give an example with sample code?

1 Like

You can add new elements to a module from outside using @eval:

# ╔═╡ 864bf160-6b37-11ec-0558-194e94ff7758
module Mod
	foo = "foo!"
end

# ╔═╡ 41fadcea-bd32-485a-8ae9-ced8ddd4043c
Mod.foo

# ╔═╡ 7a4c1856-921d-49bf-b177-68be4c911006
@eval Mod baz = "baz"

# ╔═╡ e3c093dc-f4a3-4aa7-be37-7a36735af6bc
Mod.baz

This works for every module, including Julia standard modules.

Edit: Pluto cell dependency analysis cannot track modifications to other modules using @eval, thus you need to make sure manually that the code is executed in the right order.

Hello! Yes I’ll share some sample code soon, I’ll have to tidy it up a bit and stuff.

But if there isn’t reactivity I may as well just use Jupyter notebooks :frowning:

Right :wink:

@ingredients could be a way for name spaces in the future (currently still experimental):

It allows to dynamically import other notebooks, where each notebook is a separate namespace.

Alternatively, notebooks could be used as REST endpoints from other notebooks with What you see is what you REST by ctrekker · Pull Request #1052 · fonsp/Pluto.jl · GitHub (unfortunately not merged yet).