For context, a typical workflow for me, using a Pluto or Jupyter notebook, looks like this
Read in some data / create some initial data
Do some processing
Plot some intermediate results
Do some more processing
Plot final results
Do processing again in a different way (different data, different method, etc.)
Plot intermediate results again
Do some more processing
Plot final results for second version
Compare both versions (often by plotting)
So what’s the problem?
First, I have multiple plots and because plots invariably require variables like an ax = Axis(...) or fig = Figure(...). I either have to give each axis or figure variable a new name like ax_for_intermediate_results_version1 or I have to wrap the whole plotting expression in a let ... end block, which is my current solution. This solution, however, doesn’t allow for any kind of interactive plotting. The whole plot has to be defined in one block. Adding elements piece by piece is not possible. That’s sad because, for me, the whole point of a notebook is to have a more interactive coding experience, which includes the plotting.
Second, since notebooks are great for experimentation, I often implement things in a few different ways and compare the results. Or I process different data using similar methods. In any case, I often have repeated code. And again, I run into issues with namespaces. When I write the second implementation I have to make sure I change every foo and bar to foo2 and bar2. Pluto helps quite a bit here because it won’t let you assign to the same variable twice. But still, this process of appending some kind of number or letter sequence to every variable is the cause of a great number of bugs in my notebooks and is quite tedious. You might argue that I should just write more reusable code. Put everything in functions and you’ve solve the problem, right? Yes, you have, but if you can do that, why even use a notebook? The thing that makes notebooks work, is that you have a lot of stuff in global scope that you can freely play with and inspect. Reusable code is necessarily less interactive.
What’s my proposed solution?
Implement a chapter/module feature using modules. Allow me to put multiple cells into one module. This way, both versions of my implementation get their own global scope and everything remains interactive because everything is still global and I can easily use code from other modules (or chapters) via the import/using syntax.
Every plot can live in its own little module where I can freely create the plot across multiple cells. I avoid naming conflicts and I have even better control of what I do and don’t import from the surrounding scope, than I would with a let ... end block.
I just wanted to add that I, too, have had a similar experience. I think your idea makes sense and I’d like to see something like this implemented as well, in order to simplify and streamline the kind of exploratory analyses that you outline.
I would love to see this as well. Is there an issue on Pluto.jl GitHub repo regarding this? Otherwise I can create it referencing this post. I have this issue constantly too because when analysing my lab data I have multiple datasets that I’m working with in different parts and currently I just name them p1_data, p2_data, ...
Would love to wrap each section in a namespace like a module or something
Yes but what if I want to add more variables to the module Foo later? Realistically, say I have a module named FreqDepNoise, which contains fields TotalVoltage and I want to pass TotalVoltage through a function remove_gain() and then store the result in the FreqDepNoise namespace. I can’t do that. All my code would have to be in one cell, that is importing data, processing it, plotting results, calculating error, etc. All this needs to happen in seperate cells which modules can’t do in a Pluto notebook.
Sincerely, I deal with this problem the following way:
changed_global_var1, ... = let unchanged_global_var = deepcopy(unchanged_global_var)
# code can make changes to unchanged_global_var here that this will
# not affect the global binding
values_of_new_or_changed_global_vars
end
That doesn’t help me because I still cannot have two variables named voltage. I’ll give a real example that I have. For my lab course, we were studying the dependence of thermal noise on frequency bandwidth, temperature and resistance. So the experiment has three parts. I have three datasets, one with a varying bandwidth, one with varying temperature, and one with varying resistance, but all measuring some sort of amplified noise voltage. I want to analyse all of these in the same notebook (which includes removing the noise from the instruments to isolate the noise from the resistor, removing gain, plotting results, calculating Boltzmann’s constant). Each of the parts require different kind of processing. So i import the data from an XLSX file, however I have to come up with new names for each of the measured voltages, and names for the gain removed voltages and the original voltages since I need both values while doing the data analysis. So my namespace ends up very cluttered with haphazard names and I end up namespacing mostly with an underscore as a delimiter which is a bit awkward. So I’m not necessarily changing a global variable voltage, I have multiple variables named voltage for these three different parts, and I have to have multiple steps of processing done on each of these voltage variables, and I need the intermediate variables as well to plot. This issue would be solved if I was able to create three separate namespaces.
I hope I was able to explain, I apologise if this is still difficult to understand.
I have to admit that I am not entirely sure if I was able to understand the core of your problem.
If you need three variables related to voltage available in all three parts, then create three global variables named voltage_something where something describes exactly what you are storing.
If these three variables appear in each of three parts, but have distinct values/meanings in each of these, then encapsulate each part inside a let ... end block and have the three variables local to each block, so you can use the same names inside each block, but they have no relation to the variables of the same name in other blocks.
You can add new elements to a module from outside using @eval:
# ╔═╡ 864bf160-6b37-11ec-0558-194e94ff7758
module Mod
foo = "foo!"
end
# ╔═╡ 41fadcea-bd32-485a-8ae9-ced8ddd4043c
Mod.foo
# ╔═╡ 7a4c1856-921d-49bf-b177-68be4c911006
@eval Mod baz = "baz"
# ╔═╡ e3c093dc-f4a3-4aa7-be37-7a36735af6bc
Mod.baz
This works for every module, including Julia standard modules.
Edit: Pluto cell dependency analysis cannot track modifications to other modules using @eval, thus you need to make sure manually that the code is executed in the right order.