Julia REPL flow coming from Matlab

Hi, I’m new to the Julia community and I have a question about how to use REPL to interface with my scripts.

In Matlab, it’s easy to get into a folder and, for example, set a sequence of scripts called A.m B.m and C.m.

Let’s say A.m loads a bunch of csv files into matrices in the memory
B.m is a processing script with manipulates those matrices somehow
C.m is a cleanup script which plots data, saves figures, makes reports, etc.

Usually, I’m able to load A.m once in my workspace and keep iterating on B.m in any way I want, using “clear” on unwanted variables as I go.

Can I do that same workflow in Julia? Clearly, I can’t simply type "B " in REPL and make it call B.jl in the folder.

Also, even if I can “include(“B.jl”)” or run() somehow, is does Julia have “clear” to control what things I want removed from the current “workspace”?

Thank you so much in advance!

You can’t remove variables from the workspace.

I’d suggest either putting the scripts in a package and using Revise.jl, or includeting (note the t; includet is from Revise.jl) the scripts.
This will automatically run code you changed as you edit and save the scripts.

3 Likes

Hi and welcome!

If you haven’t found it already, this link should be useful: https://docs.julialang.org/en/v1/manual/noteworthy-differences/.

I don’t have much Matlab experience, but you probably want to organize things differently than you would have in Matlab, especially older-school Matlab in which you have one function per file. In Julia you want to organize your code into functions, so create functions A, B, and C that do those read, process and output tasks (and each of those should probably not be large monolithic blocks of code but can also be composed of relatively small functions).

While it’s true that you can’t clear a variable, you can redefine variables and functions at will, and they will replace the old definitions (assuming the new function has the same signature - see multiple dispatch). It is a hassle if you are making structs, as they can’t be redefined.

Another approach you might look into is Pluto.jl. It is a reactive notebook environment for Julia that I find to be very productive for the kind of task you describe (pull in data from .csv or database, manipulate it, make plots or report). It uses some magic under the covers to allow you to delete anything.

Good luck!

4 Likes

I’m looking at something I did recently that looks a lot like what you described and I organized it like this:

using CSV
using Plots

function readmyfile(fn)
    ...
end

function smooth(data)
    ...
end

function findevents(data)
    ...
end

function makeeventtable(data)
    ...
end

function smoothet(et)
    ...
end

function process()
    output = fn |> readmyfile |> smooth |> findevents |> makeeventtable |> smoothet
    plot(output)
end

process("myfile.csv")
"""
1 Like

As Elrod mentioned, there is no such thing as a “clear”, but you can redefine you variables at will:

data = [ 1, 2, 3 ]
f(data)
data = [ 2, 3, 4 ]
f(data)

Assuming that your data is constant, as you described initially, you could very directly just load a script repeatedly with the analysis and plotting/report functions, as you fiddle with the analysis functions, something as:

include("set_data.jl")
include("analyze.jl")
include("report.jl")
#--- change something in "analyze.jl"
include("analyze.jl")
include("report.jl")

where the analyze.jl and report.jl files include both the functions and the call to those functions using the data variables. I use that frequently for fast exploration of code.

When things get more complicated, you probably want to use Revise (or even before things get complicated). With Revise you can include your script once, and the changes will be tracked automatically. In that case, the call to the functions should be done at the REPL (not included in the scripts). Something like:

using Revise
include("set_data.jl")
includet("analyze.jl") # note the "t", for track
includet("report.jl") 

result = analyze(data)
report(result)

# Modifity the functions inside analyze.jl and report.jl
# new run will be automatically with the new versions:

result = analyze(data)
report(result)

Some people just use Revise by default. And Revise goes well with modules, in which case, if you had defined a module MyModule in a file MyModule.jl, with the functions of analyze.jl and report.jl, such as

module MyModule
  include("analyze.jl")
  include("report.jl")
  export analyze, report
end

loading it with

using Revise
using MyModule # if you are in the folder where "MyModule.jl" is*

you will be able to modify the functions inside those files and they will be always be automatically updated at every new call in the REPL.

These options do not work if you redefine a data structure. Then you have to restart over. I usually keep also a script which just runs the above commands to restart the developing section when that is needed, starting julia with julia -i devel.jl.

*If you want to load the module from other folder, you need to add that folder to the LOAD_PATH, with:

 push!(LOAD_PATH,"/path/to/MyModule")

(I don’t like this much, is there a way to just point directly to the module file?)

3 Likes

As I come from Matlab as well, I can relate to your questions. My main issue was, that I needed to recompile if I wanted to “clean the workspace”. That of course takes some time if done repeatedly.
Most of the answers in this thread already point to Revise.jl which perfectly helps in that manner. It can apply changes without needing to recompile all again.

If you like using IDE’s to run scripts and calculations, there is a wonderful VS Code plugin for the julia language. It has options to run your code similarly to Matlab. Also, it uses Revise internally so any changes made to your scripts get applied without recompilation. Another cool thing is, that you can inspect variables when debugging.
Big thumbs up to our VS Code and Revise heroes :+1: :+1:

4 Likes

Thank you all for the answers. I feel very welcomed by the enthusiasm :slight_smile: . I’m very inclined to use Revise.jl, it seems like the best way to get what I need at the moment.

I think @klaff’s example is very close to what I can do. I’m curious about that |> notation. Is that “a pipe”? Aside from code space, is it the same as

out = readmyfile
out = smooth(out)
out = findevents(out)

and so on?

To keep things tidy, I intend to create a “myfunctions.jl” with function declarations and “process.jl” to call.

Is there any benefit in transforming “myfunctions” into a module? (I don’t know if Julia ends up pre-compiling if it finds it’s a module, or it’s all the same).

I understand that Modules are better for sharing code that is to be repeated, but, in this case, it’s a matter of using specific functions to process specific data, so I’m more interested in modifying them and optimizing them individually (if/when necessary).

You can do that and still use a module. Using Revise will allow you to load all your functions and modify them just by loading the module in which they were included.

1 Like

Yes, |> is a pipe. Documentation here: https://docs.julialang.org/en/v1/manual/functions/#Function-composition-and-piping

For some problems it feels really natural to me to use. It only works with functions of a single argument, but the way I develop something like this is to write the file reader, get it returning a table or DataFrame, then make the next function, and work my way down the chain.

It is also possible to combine function composition and broadcasting by using .|> rather than |>, so you can write your chain to work on a single object, then add an @. up front and change the initial argument to a vector, and boom you’re off to the races.

1 Like

There is also function composition like:

julia> f(x) = 2*x
f (generic function with 1 method)

julia> g(x) = 4*x
g (generic function with 1 method)

julia> (f∘g)(1)
8

julia> h(x) = 8*x
h (generic function with 1 method)

julia> ∘(f,g,h)(1)
64

(\circ is the symbol there)

1 Like

Interesting constructs! Julia is becoming a thing where the more I learn, the more I find interesting things to try out!

I’m trying to use VS Code to organize my projects and it shows me these suggestions:
VSCode

Are they useful/meaningful, or are they more in the "nice-to-have"s category?
The first option adds

export CSV, PooledString

whereas the second one adds this:

using CSV: read

I think I fully understand this second one. Load only parts of the module that you directly intend to use in your code, but I don’t understand the ‘export’ part.