Julia vs Python project/package structure

As I started to address in a previous post [1], I’m confused by the concepts around Julia project structure and module re-use.

Background

For the sake of this discussion, I’ve created two repositories attempting to accomplish the same simple distance calculation: one in Python and one in Julia.

I cannot get the Julia one to run, for reasons partially discussed in this StackOverflow question, which has been graciously answered by @bkamins. Specifically, I cannot give an imported function a typed signature, due to type import problems detailed in the StackOverflow question. @bkamins suggests making structs.jl [2] a local package, which confuses/surprises me.

Confusion: Python vs. Julia

I think I’m confused by the different concepts being used in Python and Julia imports. In Python, each file is a namespace and all functions + variables in the file are exported. In Julia, there can be multiple modules per file and functions must be explicitly exported.

Is it correct to say that modules in Julia are synonymous with namespaces in Python?

Going further, in Python, a package is a set of files shipped together and is often synonymous with a “project”. In Julia, is it common for a project to have multiple local packages? If yes, what is the reason for this design choice? I ask, not as a criticism, but so I can better understand/remember Julia project structure.

[1] I started a new thread, because the original thread was on a different topic.
[2] See linked Julia repository for structs.jl source and purpose

I am not a core dev, so I have not designed it, but I can give you my practical understanding:

  • A module introduces a new global name space.
  • A package is a project with reusable functionality for applications or other packages to use.
  • An application is a project that has some stand alone functionality.

A package or application may have many modules. Also a package has to be installed via a package manager in order to be visible within an application or another package.

is it common for a project to have multiple local packages

No - it is not common. A common pattern - from my experience - is to have sequential include statements in some master file. Have e.g. a look at this reasonably small main file of a package https://github.com/JuliaData/CategoricalArrays.jl/blob/master/src/CategoricalArrays.jl.

You would create a package if you had some reusable functionality. In the earlier post I have mentioned creating a package only to show you that you could configure everything so that you could do multiple using Something statements and Julia would know that this is the same thing. The normal approach - again from my experience - would be the one that I have described in the SO answer.

4 Likes

I have seen this as well, but did not understand it. For example, in the CategoricalArrays repo, the file pool.jl makes use of functions declared in buildfields.jl. However, there is no import or using statement at the top of the pool.jl file. How does pool.jl know the relevant function definitions? Does it have something to do with the package building process and how CategoricalArrays as a module, creates a namespace for all these files to share?

Was your SO answer also referring to making a package to share the namespace?

I will try to answer these questions myself by reading through some documentation.

I have seen this as well, but did not understand it. For example, in the CategoricalArrays repo, the file pool.jl makes use of functions declared in buildfields.jl. However, there is no import or using statement at the top of the pool.jl file. How does pool.jl know the relevant function definitions? Does it have something to do with the package building process and how CategoricalArrays as a module, creates a namespace for all these files to share?

No, this has nothing to do with the package building process or even with the fact that CategoricalArrays is a module. The simple reason is that in Julia, unlike Python, files do not create namespaces automatically. So if you include("A.jl") and then include("B.jl"), you’ll get the same behavior as if you’d pasted in the contents of those two files (at the top-level, to be specific). So the fact that a module is split across multiple files is purely an organizational choice to improve maintainability and/or readability of the code or to satisfy the preferences of the author.

For more information, see: Modules · The Julia Language

Is it correct to say that modules in Julia are synonymous with namespaces in Python?

Mostly, but with some exceptions. Python packages mimic the file structure because files create namespaces, while in Julia this is not the case. But we otherwise generally use them in similar ways and for similar purposes.

Going further, in Python, a package is a set of files shipped together and is often synonymous with a “project”. In Julia, is it common for a project to have multiple local packages?

No, a Julia package also defines a single module whose name matches that package. That module can, however, have additional sub-modules. For example:

(v1.0) pkg> add MathOptInterface

julia> using MathOptInterface  # the module provided by the MathOptInterface package

julia> using MathOptInterface.Bridges  # a sub-module

julia> using MathOptInterface.Utilities  # another sub-module

The presence or absence of sub-modules also has no required connection to the file structure. A hypothetical implementation of the MathOptInterface package could contain just once source file:

module MathOptInterface

module Bridges
end

module Utilities
end

end

or many files, combined using include() (that’s what the actual MathOptInterface package does).

9 Likes

Do you know the history of this decision? It’s unlike any other programming language I’ve used, so I’m curious what motivated it.

1 Like

I don’t know about the history of the decision in Julia, but in C you can (theoretically) do the same thing with its #include. Granted, there isn’t really any concept of modules in C though :smiley: