Interactive prototyping workflows

Whenever I run into this problem, I realize the solution is basically always to use Pkg.generate and make an actual package, then have a scripts folder in addition to src which has scripts. That way you can just have a single using (which itself has lots of exports). You can also do this somewhat hacky way to export all names from a module

julia> module MyMod
           w = 5
       end;

julia> using .MyMod

julia> function exportall(mod)
           for n in names(mod; all=true)
               if Base.isidentifier(n) && n ∉ (Symbol(mod), :eval, :include)
                   @eval mod export $n
               end
           end
       end;

julia> w
ERROR: UndefVarError: `w` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

julia> exportall(MyMod)

julia> w
5

You can also do REPL.activate(MyMod) and work within your module, obviating the need for too many MyMod.xxx prefixes.

3 Likes

I think Pluto.jl is the best way to go. If you need write modules, you can use Revise.jl in another session, then reload your module in Pluto.jl. It will work following your specs.

2 Likes

I didn’t understand these so I just repeated them for myself.

julia> run(`cat up.jl`);

function foo(x::Vector{Float64}, a::Float64)
    return x .+ a
end


julia> include("up.jl")
foo (generic function with 1 method)

help?> foo     # there is one binding for foo
search: foo for floor Bool fdio

  No documentation found for private binding Main.foo.

  foo is a Function.

  # 1 method for generic function "foo" from Main:
   [1] foo(x::Vector{<:Real}, a::Real)
       @ /tmp/up.jl:2
       
julia> foo([1.0], 1.0)
1-element Vector{Float64}:
 2.0

julia> run(`cat up.jl`);    # change the type signature

function foo(x::Vector{<:Real}, a::Real)
    return x .+ (a / 2)
end


julia> include("up.jl")
foo (generic function with 2 methods)

help?> foo    # now there are *two* bindings named foo.
search: foo for floor Bool fdio

  No documentation found for private binding Main.foo.

  foo is a Function.

  # 2 methods for generic function "foo" from Main:
   [1] foo(x::Vector{Float64}, a::Float64)
       @ /tmp/up.jl:2
   [2] foo(x::Vector{<:Real}, a::Real)
       @ /tmp/up.jl:2

julia> foo([1.0], 1.0) # calls the older "hidden-state" definition because Float64 is more specific than Real
1-element Vector{Float64}:
 2.0

julia> foo([1.0], 1) # calls the intended function
1-element Vector{Float64}:
 1.5

I think I can live with this. Overwriting definitions works without magic, I just have to remember that the type signature is an essential part of the name of a function, and Julia’s polymorphism means you can fall into calling a ghost if you’re not paying close attention.

Thanks for the clarification!

Iiiiinteresting. That’s the sort of insight I came here for. I’m curious how common that is. The workflow tips mention writing a package as an option but I figured that was for, you know, writing utilities to share, not a specific data analysis.

If you use Pkg.generate then I guess Revise works perfectly? Or, not, I guess it still doesn’t re-evaluate variables whose definitions haven’t changed even if the inputs to those definitions have.

Thanks for all your contributions everyone!

If you use Revise with this Pkg.generate workflow, it will re-load all functions that are defined, but global variables in your script will not be updated. So you would still need to do include("scripts/my_script.jl") to do re-run, and this will pollute your namespace (which can be good or bad depending on your needs).

PS: If someone sent me a Pluto.jl notebook I would be pretty annoyed. Its a pretty bespoke development environment and I would much rather just get something I can easily run in the terminal.

1 Like

I envy you if you sure you can. I have on many occasions teared my hair out before “just remembering” that I constructed that trap with my own hands.

The behavior of Revise is the same whether you using a package, or includet a script.

I don’t know how common, but that’s what I generally do if I have anything more than a few lines. Just I’d use PackageMaker.jl (being author of it) instead of Pkg.generate; see also general considerations concerning workflows in it’s documentation as to why it’s better to start with package right from the beginning. There is one more bulletpoint to be added: With a package for every task, all dependencies are installed there and not into the main environment, reducing the risk of dependencies conflict.

If you need to play around with global variables, you can use the trick as proposed above by lmiq - here one more variation:

# in the package body
my_data() = (;a=1, b="b", c=[1,2,3])
# in a script
using MyPackage
using MyPackage: foo

(;a) = my_data()
answer = foo(a)
1 Like

Ooo okay. Noted. I’ll see how far I get. Maybe I will find myself bald in six months.

The hair grows back :grin:

1 Like

Thank you for launching this discussion. I am also in search of a better workflow. I apologize for the long message, but I thought it would help to explain how I struggle with basic things - after all, this is a “New to Julia” forum.

I mostly work in VSCode, building scripts and testing them interactively by executing lines or sections one at a time, and from time to time restarting the kernel to make sure I am not in some parallel universe. When something gets to a state where it needs to be “properly” documented or shared, I tend to switch to Jupyterlab, but I never use it for my own work, I find VSCode scripts with the REPL always ready more convenient. I also transferred from Matlab the habit of saving intermediate results when one part of the analysis works, so I will sometimes end-up with scripts like “step1”, “step2”, etc. Very occasionally, I think some functions I created could be useful to someone else or the future me, so I try to code them properly and perhaps put them in a module.

I have played with precompiled sysimages in the past, when it was (for me) reducing the frustration a lot, but I find that they are unnecessary now (again, to me!).

The main problem for me to switch to a “better” workflow that would not do everything in Main is debugging. I have given up on emulating C/Fortran debugging (as in “F10 / F11” in Visual Studio). Infiltrate is working well for me, but I find it’s polluting my code with commented out (or just “forgotten”) @infiltrate instructions, plus it requires me to switch from running lines or sections to running the whole file with include(), so it’s not super convenient.

One strong limitation of my “fully interactive approach” is loops, do blocks, etc. that create their own scope and cannot be evaluated line-by-line. What I am doing - which seems stupid - is to replace loops by manual iterations (as in having a line that says i = i+1, running that, running the section of code that should be in the loop, and then finding the problems with the next value of i). Once the code works for “many values of i”, I put back the actual loop, perhaps move the code to a function, and move on to the next part of the analysis. I typically have to change some stuff because of the scope behaviour but that’s OK.

I have not been able to find instructions that explain to someone like me how to adopt a better workflow. Back to the feeling that I need a “(really) New to Julia” forum…

If a generous soul has read this far and would have ideas for a “light approach” to adopting a better workflow, going one step at a time, i.e. adopting something that is not yet “good” but at least “better”, that would be really great!

I do feel like include(Module(), "my_script.jl"; interactive = true) would be a cool feature. It would

  1. Create a temporary module (but maintain currently compiled packages)
  2. Run my_script.jl in this temporary module.
  3. At the end of the script, open up a REPL prompt in this temporary module.
1 Like

That looks like

using REPL; let m = Module()
  Base.include(m, "my_script.jl") # not the module-specific include
  REPL.activate(m)
end

but I’m not sure how well that’ll work for noninteractive julia commands. Maybe this can be a REPL-specific function.

2 Likes

How does GC work here? If I re-run this multiple times, will these old temporary modules get GC’d?

Maybe you can find some useful tips in :slight_smile:

3 Likes

Use my “power of 10 rule.”
If you’re going to use the code 10 or fewer times, a script / macro is fine.
More than that, make it a function. It’ll run cleaner, be more flexible, and easier to verify the code in use.

If you’re very used to working within IDEs my approach might not win you over, but I’ll offer it anyway. I’m a lot more comfortable working "CLI"ey. This explains my motivation well

(admittedly if you do any I/O at all then there’s external state that might change your program, but as far as your imports go it’s deterministic)

The first thing I do is put this as my shebang

#!/usr/bin/env -S julia --project=.

And keep in my head that ./script.jl is always my entry point. Working this way pushes against using debuggers and towards using logging.

I am much more experienced debugging with prints than debuggers, so there’s some bias here, but I think it’s better! You can save the output and share or compare it. I’ve isolated some deep deep numerical bugs with little knowledge my libraries simply by adding prints and turning log levels way up and then diffing versions of my output until I pinned down where the versions diverged. I don’t think I could have done that with a debugger; it would have, at least, taken much much longer.

You must have the habit from matlab of using that every line that doesn’t end in ; prints its value. In Julia, the REPL does that by wrapping an implicit display() wrapped around every top level value it, and I imagine that habit is what’s keeping you working in the REPL in VSCode. But you can just re-add that feature in the non-interactive case!

So I would write e.g.

for i = 1:10
  A = 3*M[i]^2 + 4*M[i]
  display(A)   # DEBUG
end

I know this “pollutes” your code too, but I think it’s worth the trade off, and that’s why I tag them “DEBUG” so that I can either search-and-replace them out later or know to filter them out mentally while I skimming code.

@infiltrate (or breakpoint() in python) is useful for me when I need to figure out how to write an expression in the middle of some deeply nested code that I’ve already written. But even there as soon as I figure it out I display() (or print(repr()) in python) it so that as I test my code I can see if my new expression is working.

The main draw of Jupyter/Pluto is that they also add implicit display()s in a file you can share. I kind of wish there was a way to turn that on in the Julia CLI so that julia script.jl behaved like matlab. Of course the other advantage is that Jupyter’s display() is augmented to show interactive graphs and tables which is something the CLI will never be able to do.


Debugging with print only works if your startup time isn’t that long, which is what motivated me to post this thread. Looking at the Related threads, I’m not the only one who is coming in with the CLI instinct and running into e.g. especially the relatively long time-to-first-plot in Plots.jl and wondering what we’ve been doing wrong.

What I’m hearing is that people lean mostly on not restarting Julia, by using Pluto or the Julia process vscode spawns or Revise.

I think adapting my $ python -i script.py habit to julia> include("script.jl") splits the difference well enough, though as Nathaniel mentioned in the thread I linked it’s not 100% clean because you could e.g. mess up some internal setting inside of Plots that can only be cleared by a full restart; but at least this way a full restart is reliable. Since I am always aiming at getting my final results from ./script.jl I make sure to test that periodically, and during dev include("script.jl") is 99% the same. There’s no extra commands that need running or environment variables that need setting or virtualenvs that need loading or code servers that need launching.

Thanks for your tips. The CLI workflow looks very much like my Matlab workflow, because I never got used to running code line by line or by sections in the Matlab editor.
I see a benefit in comparing detailed debug log to see where they start diverging (but I am a bit afraid of “polluting” my code with a lot of print statements!).

Yes exactly, that’s why I need a debugger, sometimes seeing the result of a line is not enough for me to fix the next line, I need to experiment with the syntax. And you’re right, @infiltrate does a great job here.

I think I will try to give up running lines of code one line at a time or by sections, and instead try to always use include("script.jl") to run it. That way, I still will have to deal with @infiltrate statements lying around, but at least I won’t have to deal with the differences between interactive and not interactive behaviours for scope. But I will still rely on “collapsible” sections in VSCode with #region, I find that really cool for longuish scripts.

(By the way, I also learned thanks to your answer how to quote part of a post, that’s cool, thanks!).

1 Like

I followed this on a flight of fancy. In my startup.jl:

if isinteractive()
# https://discourse.julialang.org/t/interactive-prototyping-workflows/133265/30
import REPL
struct InteractiveMode end
Interactive = InteractiveMode()
(::Base.IncludeInto)(script::String, ::Val{Interactive}) = begin
  let m = Module(Symbol(script))
    Base.include(m, script) # not the module-specific include
    REPL.activate(m)
  end
end
(::Base.IncludeInto)(script::String, interactive::InteractiveMode) = include(script, Val(interactive))
end

I tried to make include(script; interactive=true) work but I couldn’t get the magic of Val() to work with keyword arguments. I guess you don’t really need the type-based multiple dispatch, that was the flight of fancy part.

Demo:


julia> run(`cat t.jl`);

A = 1
display(A)

julia> include("t.jl", Interactive)
1

(Main.var"t.jl") julia> A
1

(Main.var"t.jl") julia> A = 8
8

(Main.var"t.jl") julia> Main.REPL.activate(Main)  # quit

julia> A  # namespace was not polluted
ERROR: UndefVarError: `A` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

I’m not sure if I’ll like it, yet. REPL.activate doesn’t nest; rather, it simply updates a global pointer to what module you’re interacting with, so to step out of it you have to explicitly step back to the previous module you were in; and also you need to call it Main.REPL.activate when you’re inside the other module. I wish I could Ctrl-D to get back to Main.

I’m not sure either how this interacts with garbage collection. It’s neat though!

I feel like this is worth branching into another thread if you want to continue developing this.

I’m learning too. Julia’s built in logging system is simple and unobtrusive and good.

Instead of

display(value)  # DEBUG

write

@debug value

The print is suppressed unless you launch Julia this way

JULIA_DEBUG=Main julia

or set that variable near the top of your script

ENV["JULIA_DEBUG"] = "Main"

The @debug should make it super clear that readers don’t need to get distracted with it, and it doesn’t pollute your output at all unless asked and you don’t need to be commenting things in and out all the time.

(python has controllable debug logging too but I never use it just because I never remember the boilerplate you need to set it up the right way)

1 Like

Pluto notebooks are normal scripts though, with some cell IDs and the Project/Manifest included. It has been made with beginners and teaching in mind, that sets the right expectations I guess. Anyway, you just need to try a little demo project and see how the workflow feels.

### A Pluto.jl notebook ###
# v0.20.19

using Markdown
using InteractiveUtils

# ╔═╡ 71b8e133-4931-4069-b64b-a44fcbd3cc87
using Plots, XLSX, DataFrames

# ╔═╡ dd4cdb8d-ffc7-4f21-b8ef-f42dc005d95f
# load some data

# ╔═╡ 293a2b63-609e-4d85-a916-1b55c3806a84
# drop mangled parts

# ╔═╡ fd466b43-e4f8-4f7e-b869-9d4a269732ee
# calculate some statistics

# ╔═╡ abd9a842-3845-4c56-942c-139bd90dd0f7
# plot some stuff
plot(x, y,
  title="....",
  xlabel="...",
  ylabel="...")

# ╔═╡ 4617ff35-5dfa-4bbf-bc75-b6169594e75a
png("savefig.png")

# ╔═╡ 00000000-0000-0000-0000-000000000001
PLUTO_PROJECT_TOML_CONTENTS = """
[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
XLSX = "fdbf4ff8-1666-58a4-91e7-1b58723a45e0"

[compat]
DataFrames = "~1.8.0"
Plots = "~1.41.1"
XLSX = "~0.10.4"
"""

# ╔═╡ 00000000-0000-0000-0000-000000000002
PLUTO_MANIFEST_TOML_CONTENTS = """
# This file is machine-generated - editing it directly is not advised

julia_version = "1.12.0"
manifest_format = "2.0"
project_hash = "20629648fbf85db117eb3241953d12233935e599"

[[deps.AliasTables]]
deps = ["PtrArrays", "Random"]
git-tree-sha1 = "9876e1e164b144ca45e9e3198d0b689cadfed9ff"
uuid = "66dad0bd-aa9a-41b7-9441-69ab47430ed8"
version = "1.1.3"

# Lots of deps...
"""

# ╔═╡ Cell order:
# ╠═71b8e133-4931-4069-b64b-a44fcbd3cc87
# ╠═dd4cdb8d-ffc7-4f21-b8ef-f42dc005d95f
# ╠═293a2b63-609e-4d85-a916-1b55c3806a84
# ╠═fd466b43-e4f8-4f7e-b869-9d4a269732ee
# ╠═abd9a842-3845-4c56-942c-139bd90dd0f7
# ╠═4617ff35-5dfa-4bbf-bc75-b6169594e75a
# ╟─00000000-0000-0000-0000-000000000001
# ╟─00000000-0000-0000-0000-000000000002

Yeah. That looks like a pain to edit in a text editor. I would have to make sure not to mess up any of these hashes or the large pluto toml file at the end. I would much prefer someone send me a .zip of a package (or a github link, obviously).

1 Like

For interactive work I always start up julia with:

julia --project=. -O0 -t auto

This keeps track of package versions, reduces compilation times and gives me more than 1 thread.

Simple prototyping is done just in a folder dedicated to the given task, but as soon as I think I know what I’ll be doing, I create a project with PkgTemplates.jl