Best practise: organising code in Julia

Hi

A newbie here, from OOP background. I am looking to put together a project which will:

  1. Download data from Internet
  2. Do some manipulations on the downloaded data
  3. Do some analysis on the downloaded data

Please note the downloader, analysis would be reused across other projects as well.

I have been reading up Julia docs and looking at how other Julia projects have been implemented. In Java, I would have organised these into separate packages and class files.

Options:

  1. Create Modules: Julia docs suggest modules as a way to organise code in coherent units. it also seems to offer advantages like precompilation etc. When should I use Modules? what are the best practices?
  2. Create multiple files and “include” them in another file. for instance, the Indicators.jl project.

Considerations while selecting an option:

  1. Performance: this is key
  2. Debugging: since I am new, I would like to be able to debug quickly
  3. Readability and Reusability
  4. Code defined in one file would be used in another file.
  5. please do consider that this project would/could become quite big and comples

I am wondering which is the best practice on organising my code.

ta!

Reference:
Julia docs on Modules: Modules · The Julia Language
Example project which shows how to include other scripts: https://github.com/dysonance/Indicators.jl
my other post would point out how I plan to create my project: Best practice to support multiple implementations - #11 by stevengj

4 Likes

See many previous discussions: Pros & Cons of using modules vs plain include? and Single module vs. submodules in a project and Large programs: structuring modules & include such that to increase performance and readability and What is the preferred way to use multiple files?

TLDR: Start with a single module in a single package, split into as many files as you want, and refactor as needed later on.

  • Create at least one package for any major project, so that you can benefit from the Julia package system and the tooling around it.
  • The most common pattern is a single module per package, but split up the implementation into as many files as you want, and just include them from the top-level MyPackage.jl file. (Each file is included once. It is no problem to call code from one file in another file of the same module.)
    • The reason to use submodules within a package is if you are running into namespace collisions from different parts of your implementation, but you can always introduce submodules later if this becomes an issue.
  • Generally, split your project into multiple packages once you have functionality that could reasonably be used on its own. You can always do this later.

(Use Revise.jl during development, so that you can edit the package and execute the new code without having to re-load the whole package.)

36 Likes

OP wants to download and analyze some data. What is the benefit of using a module here? Modules make it harder to selectively execute code. For example, given

module M
x = download()
# I want to explore here.
# ...

y = runmodel(x)
end

If I want to explore the data interactively without running runmodel, I have to comment out runmodel – I can’t just choose expressions to run in the module.

Presumably, they want to do a similar analysis more than once — if @roh_codeur is asking about large-scale code organization, they must not be talking about a one-time throwaway script. If they have a non-trivial amount of code, separating it in a package makes it easier to re-use in multiple different projects.

That is, create a MyDataAnalysis.jl package (hence a module) that has the re-usable parts of the analysis and data acquisition that you want to do. Then, for any particular project or task, write a script or Jupyter notebook etcetera that does:

using MyDataAnalysis # & other packages as needed ...

#... acquire some specific data ...
data = ...

# run some analysis
MyDataAnalysis.frobnicate(data)

To selectively execute code (i.e. re-use pieces), the best approach in the long run is generally to separate it into functions that you call individually, not by commenting out or selectively executing sections in a big script — that way lies madness for large projects.

For small throw-away scripts, of course, I agree — just make a Pluto/Jupyter notebook or similar and edit/execute it in chunks until it does what you want.

5 Likes

that’s precisely what I have in mind.

@stevengj : thanks for your suggestions and the links. I will work on a solution and post back. the only bit I am not sure how debugging would work with “include” files in vs code. I will give it a whirl.

ta!

If you can get the book Hands-On Design Patterns and Best Practices with Julia, it might have many answers to what you are asking.

4 Likes

Another part of the problem described by the OP is the organization of code, data, workflows, etc. For that I would also recommend taking a look into Dr Watson:

https://github.com/JuliaDynamics/DrWatson.jl

3 Likes

In addition to :+1: I’d like to mention that you can now buy the pdf for 5 € (Hands-On Design Patterns and Best Practices with Julia | Packt) [campaign expired]. Well worth the time imo (maybe quickly skim through the initial design pattern stuff but concentrate on the “Section 2: Julia Fundamentals”. Section 3 as you please. For me the benefit was not so much the patterns but the many nicely presented code examples).

6 Likes

thanks to everyone who pitched in, this is quite useful indeed. So far, I have come up with the below:

my-types.jl

Base.@kwdef struct SomeType
    field1::String
end
my-utils.jl

using DataFrames

function someFunc(someType::SomeType)
    println("using sometypess $someType")
    df = DataFrame()
    return df
end
ModelRunner.jl

module ModelRunner

include("MyTypes.jl")
include("MyUtils.jl")

export SomeType, someFunc, runModel

function runModel(someType::SomeType)
    println("in run()")
    someFunc(someType)
end


end
my-script.jl

baseDir = @__DIR__
@info "Starting in $baseDir"
cd(baseDir)

push!(LOAD_PATH, pwd())

using ModelRunner

someType=SomeType(field1="aaa")
runModel(someType)

potential improvements:

  1. See how I can remove the push! statements in the script
  2. I haven’t had a chance to look at all the resources, have ordered the book. I am afraid I am still not clear on benefits of creating each file as a Module vs include in the files. the only bit I can see so far is that if I wanted to use MyUtils in a different project, I wouldn’t be able to use it, since it uses my-types.jl. so, if I wanted to reuse across projects, I will have to create each file as a module. alternatively, follow the same pattern as above. if I created each file as a module, it takes a while to load the file.
    3, Does it have any performance impact using module vs include approach? I expect my project to be a few thousand lines in code alteast.

I look forward to your feedback

ta!

While looking through the documentation, I found the below link. I feel this answers most of the queries, except the one about performance, i.e. performance benefits of one module with include of multiple files vs multiple modules.

https://docs.julialang.org/en/v1/manual/code-loading/

I will post back my findings here

I have posted my findings in this thread. For someone ending up here, Option (2) seems to be a better option.