How to manage Julia Projects

Hey all,

I’ve recently started digging into Julia and am trying to implement some of my Python projects in Julia in order to get used to the language (these are mainly data science related projects). There is a lot I like out of the box but am currently struggling to structure a Project I’m working on.

This is how I usually structure my Python projects (which shares similarity to Julia’s Package structure):

…/ProjectName/
├── src/
│ ├── definitions.py (contains constants)
│ ├── train.py
│ ├── lib/
│ │ ├── dataset.py
│ │ ├── helpers.py
│ ├── models/
│ │ ├── modela.py
│ │ ├── modelb.py
├── data/…
├── results/…
├── tests/…
├── poetry.lock
├── pyproject.toml
├── README

How I work through this process:
I first define constants in definitions.py, usually absolute paths to the contents of the data/, results/ folders etc.
This file exports a Dict:

# from src/definitions.py
PATH = {"data1" : "somePath", ...}

I can then import this variable in dataset.py by:

# from src/lib/dataset.py
from src.definitions import PATH
print(PATH)

I can run this file to check its output by running:

$ poetry shell   # activates virtual env
(myEnv) $ python -m src.lib.dataset   # -m meaning run from module perspective
PATH("A": "somePath")

Say I have the exact same setup for a Julia project, I am then facing two issues doing the same:

  1. How do I enter the correct Pkg environment from bash? If I execute
$ julia src/lib/dataset.jl

It will run that file from my global julia installation, and not from my environment.
Others suggested opening Julia in the bash, setting the environment with activate . and then using include(“src/lib/dataset.jl”) to “run” the file. I really don’t like this, is there a more straightforward method to do this?

  1. How can I import from top-level files from within sub-directories?
    An obvious way is to just run
# from src/lib/dataset.jl
include("../definitions.jl")
print(PATH)

But it makes sense that a lot of files import from this file, so I would not like to use include.
I have since been messing with making all the files in my directory modules and attempting to import those modules from sub-dirs but to no avail. Also that seems like a lot of boilerplate to just import some variables…

A true package of course defines a MyPackage.jl, and exposes all it’s contents by include and exports. But I don’t really see that being very intuitive for data science projects.

I want to run something like:

# from src/lib/dataset.jl
import definitions: PATH   # or src.definitions: PATH

Anyone who has any suggestions for tackling this?

I encourage to use the structure of a package in Julia (created by PkgTemplates.jl, or if it is more complex to see Dr Watson.jl for amore scientific app (as a data sciente project).

For answer your question 1, I recommend to use:

julia --project=. src/lib/dataset.jl

For question 2, include is a very useful way to include content, it gives all the flexibility you can ask. However, the definition of a package many times is more intuitive (I disagree in that it is not intuitive).
share variables (as PATH) between files does not seen nice for me, I would use a function to return that information. Dr Watson can be useful, check it.

4 Likes

Hey dmolina,

Thank you for you response, the template for Watson seems very much like what I am trying to achieve.
I created a function in my .bashrc to run julia --project=. just to save me some typing :wink:

The main problem I’m fearing with include is that I can imagine I have a file that performs an intensive task. Say from a top-level file I import that file, spend some sec’s parsing the content and continue importing other files. If one of those files imports the task intensive file again, the script gets executed again and again you have to wait some secs.

I guess in that case it will be my responsibility to prevent cyclic dependencies like this, but with using import this behaviour is prevented in the first place. I also like that with import I can be explicit in what specific function I import from the file, while using include I cover up the fact that I am executing the code and it is not really clear where specific functions originate from.

Another option is to define instead of using --project is to define the environment variable JULIA_PROJECT to “.”.

You must not include the code directly in the files, for many reasons:

  • The problem you indicate.
  • All variables are global and the performance is a lot worse than expecting.

You must define the functions in these files, not scripts.

If you want to create a file “main.jl” that could be directly run with julia but that does not run with the include you can do:

function main()
....
end

# only run with julia it is run, not with include 
if isinteractive()
  main()
end

The cyclic dependencies are avoided with a good design of the files, it is more a design problem than a problem of the language.

You can also use using to include only one file, but it is only working with packages, at difference of Python, the files are not packages.

1 Like