Multiple Julia processes while using Modules

Hi

Following my discussion from the below thread, I have been trying out 2 approaches to evaluate Julia project organisation approaches.

  1. Creating files as modules and “using” them in other files
  2. Creating them as individual files and “include” them in other files
module MyModule

using Dates
using DataFrames

function MyModuleMethod()
    println("some method")
end

export MyModuleMethod
end
using MyModule2

using MyModule

function MyModuleFunc2()
       println("MyModuleFunc2 called")
end

export MyModuleFunc2

end
module MyRunner

using MyModule
using MyModule2

function __init__()
    println("in init")
end

function main2()
    MyModuleMethod()
end

export main2

end

runner.jl

baseDir = @__DIR__
@info "Starting in $baseDir"
cd(baseDir)

push!(LOAD_PATH, pwd())

using MyRunner
println("starting")

main2()

println("done")

This approach seems considerably slower than approach 2. While looking into the reason, I noticed that it was spinning up multiple Julia processes, using up a lot of memory on my machine. the second approach remarkably well. Its faster and doesn’t spin up multiple processes.

My machine:
Apple MacBook Pro M1 Max
Julia: 1.7.1

I am going with approach (2) for now.

  1. I would like to know where I went wrong
  2. It seems that there is a memory overhead(along with compilation overhead with Modules). that’s a bit of shame because I am trying to port a massive project to Julia. While I do this, I am also learning Julia. Revise.jl is excellent because it recompiles changes to Modules, so, separate small modules seemed like a great idea. Now, with approach (2), I have a single module, which takes a long time to compile. I must point out that subsequent compiles are pretty fast

thanks

cheers
Roh

I think you are measuring precompilation of DataFrames.jl. Your code itself takes no time at all.

yeah, I have been playing around and yeah indeed, my code doesn’t take much time. I still don’t know what caused the multiple Julia processes to spawn though

Petr mentioned it earlier: precompilation. Julia is a compiled language in normal usage. When you make a change to the source file on the LOAD_PATH, Julia will invalidate the compilation cache and perform precompilation. Note that this precompilation step consists of parsing and type inference, and not native code generation. As of Julia 1.6, there is now parallel precompilation:

2 Likes

Looking through both this thread and the other thread, I’m not sure if the importance of Project.toml has been fully appreciated. In particular, the UUID is really important in terms of precompilation and caching. By modifying the LOAD_PATH you are bypassing these normal mechanisms of code loading.

In the previous thread you mentioned Java. In Java, a class file is the basic compilation unit. The name of the class is qualified by its Java package. In Julia, a package is the basic compilation unit. It is qualified by its UUID in Project.toml. If there are two package modules of the same name, Julia’s package manager can distinguish them via their UUID.

What I have not seen from you above are any Project.toml files. What I have seen is LOAD_PATH hacking. Do away with the LOAD_PATH hacking. Make each of your modules into true packages, which can be compiled and cached separately.

4 Likes

@mkitti thanks for your response mate. I am sorry, I am a newbie to Julia. I have since read about the Project.toml and see what you mean. I will give it another shot and come back to this thread.

ta