How to improve compilation time and loading time of packages?

EDIT: I was mistaking the compilation time of the utiity package with its loading. See message below:

Original post:

Hello, I’m trying to improve the loading time of my code which is for the moment ruining my development productivity. I am using Revise.jl but still I very often need to restart my session and reload all my project’s code.

My first attempt has been to take out some of the code from the main project (MyMainProject) and put it in separate packages called PostgresqlDAO.

This package is not on github, I make it accessible to MyMainProject by using push!(LOAD_PATH, "/home/myuser/CODE/PostgresqlDAO.jl/")

In runtests.jl in my main project I now have the following:

using Pkg

push!(LOAD_PATH, "/home/myuser/CODE/PostgresqlDAO.jl/")

using Revise

using Test
@time using TickTock, Random, Dates, UUIDs
@time using RDatasets # to check how long it takes to load a big package
@time using PostgresqlDAO

@time using TickTock, Random, Dates, UUIDs returns: 0.006625 seconds (21.84 k allocations: 1.161 MiB)

@time using RDatasets returns: 2.494468 seconds (6.75 M allocations: 379.499 MiB, 5.19% gc time)

@time using PostgresqlDAO returns: 7.691385 seconds (19.70 M allocations: 1014.108 MiB, 4.96% gc time)

The loading time of the code of PostgresqlDAO is actually the same as before (when it was embedded in MyMainProject).

I have the following message in the console which I interpret as PostgresqlDAO is recompiled every time I start a new session

[ Info: Recompiling stale cache file /home/myuser/.julia/compiled/v1.1/PostgresqlDAO/aXv1G.ji for PostgresqlDAO [9f278218-a07e-11e9-18c7-113e7cd19c36]
[ Info: Recompiling stale cache file /home/myuser/.julia/compiled/v1.1/MyMainProject/bj4UF.ji for MyMainProject [9f980d3e-8c05-11e9-23d2-ff41aa27f809]
┌ Warning: Package MyMainProject does not have PostgresqlDAO in its dependencies:
│ - If you have MyMainProject checked out for development and have
│   added PostgresqlDAO as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with MyMainProject
└ Loading PostgresqlDAO into MyMainProject from project dependency, future warnings for MyMainProject are suppressed.

Is there a way to ‘cache’ the compilation of a local package?
If not what can I do? Does putting it on github as an unreferenced package would help?
In general, are there some good practices to improve the compilation time of our code?

what is the reason you need to reset your session despite using Revise.jl?

Have you tried PackageCompiler.jl’s incremental compilation?

Does setting JULIA_DEBUG=loading give any reason for being unable to use the cache? That shouldn’t happen if nothing changed.

1 Like

unfortunately setting JULIA_DEBUG to loading doesn’t give any additional logging messages when calling: using NameOfAPackage

$ JULIA_DEBUG=loading julia -e 'using InteractiveUtils; using Pkg; versioninfo();'
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
  JULIA_DEBUG = loading

gosh…problems are piling up

A very common reason I run into for needing to reset my session is when I modify a struct.

1 Like

Some additional information: the utility package is actually not recompiled if it is not changed.

If the utility package is modified, calling @time using PostgresqlDAO gives:

[ Info: Recompiling stale cache file /home/myuser/.julia/compiled/v1.1/PostgresqlDAO/aXv1G.ji for PostgresqlDAO [9f278218-a07e-11e9-18c7-113e7cd19c36]
 15.867777 seconds (21.47 M allocations: 1.073 GiB, 2.41% gc time) 

If the utility package is NOT modified, calling @time using PostgresqlDAO gives:

7.756099 seconds (19.76 M allocations: 1015.470 MiB, 4.58% gc time)

Those 7.75 seconds are because the loading of the utility package implies the loading of its own dependencies. If I preload those dependencies (using Tables, DataFrames, JuliaDB, Query, LibPQ, Dates, UUIDs, TickTock) calling @time using PostgresqlDAO gives:

0.159535 seconds (205.38 k allocations: 10.726 MiB, 4.35% gc time)

No loading time anymore (I guess this is the case as long as the versions of the dependencies declared in the Manifest.toml of MyMainProject and PostgresqlDAO are the same).

So, all that to say that what actually takes time is not the compilation time but the loading time of the packages required by my utility package PostgresqlDAO. A quick search shows me that I am not the only one to find the loading of the packages very slow. I’ll dig into it to see if there’s useful anything I can find.

I’ve ended up using compile_incremental from package PackageCompiler.jl.

I have created a new system image that includes the packages required by my utility package and the main project. I didn’t include the utility package in the system image (because it is now very fast to compile and load and I may need to change it frequently).

using Pkg
using PackageCompiler

syso, sysold = PackageCompiler.compile_incremental(:DataFrames,:Query,:LibPQ
                                                   # :JuliaDB # not possible to add
                                                   blacklist = [:LaTeXStrings])

Note that I am 'dev’ing PackageCompiler (and not 'add’ing it) because at the moment I tried to use PackageCompiler there was a bug and people were advicing to ‘dev’ it instead.

Once built I relaunch julia with the sysimage option set to the path of syso (the first element of the tuple returned by PackageCompiler.compile_incremental):

This is not ideal because the system image is very long to create and I may need to update it often. I am also not sure how this works if I want to change version of a package or if I need to have different versions of a package depending on the project I am working on.

But at least the loading of my utility package is now 0.072997 second.

1 Like

Useful feedback!
I am also struggling with both loading and compiling that is taking long. I have wondered about going the building sysimage route in the past, however was also concerned with frequency of having to rebuilt the sysimage and decided to hold off until closer to production time.

Can anyone explain what is happening during the loading process? I would naively think it should be very fast to open the files and do minimal syntax parsing to be able to split it into functions etc. The real effort should come in when the call to invoke the function happens later and it needs to be compiled. But I’m guessing it is building the full AST that is taking time and not just going looking for keywords that start new blocks and finding the matching end statement to close the blocks? I guess if you try that shortcut certain macros would break.