Problems with executable compilation using PackageCompiler.jl

Hello everyone,
I was trying to build some julia code into command line applications with PackageCompiler.jl, but I have noticed that, still, they do rely on compiling the functions, or parts of it, on first call.

As a working example, I have modified the hello.jl example file:

module Hello

using UnicodePlots

function plot_unicode_sine()
    println(lineplot(1:100, sin.(range(0, stop=2π, length=100))))
end

Base.@ccallable function julia_main(ARGS::Vector{String})::Cint

    @time a = 10 #Just to make sure that all things executed with @time are compiled...

    @time println("hello, world")
    @time println("hello, world")
    
    @time plot_unicode_sine()
    @time plot_unicode_sine()
    
    return 0
end

end

After compiling it to an executable using the build_executable("hello.jl", "hello") command, the resulting ./hello executable gives me this output:

0.000000 seconds
hello, world
  0.006080 seconds (22 allocations: 1.547 KiB)
hello, world
  0.000043 seconds (4 allocations: 80 bytes)
      ┌────────────────────────────────────────┐ 
    1 │⠀⠀⠀⠀⠀⠀⠀⡠⠊⠉⠉⠉⠢⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⢠⠎⠀⠀⠀⠀⠀⠀⠘⢆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢀⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⡎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠼⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠬⢦⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⢤│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠇│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡎⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡜⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢆⠀⠀⠀⠀⠀⠀⢠⠎⠀⠀⠀⠀⠀│ 
   -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⣀⣀⣀⠔⠁⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      0                                      100
  0.341968 seconds (936.42 k allocations: 45.406 MiB, 0.58% gc time)
      ┌────────────────────────────────────────┐ 
    1 │⠀⠀⠀⠀⠀⠀⠀⡠⠊⠉⠉⠉⠢⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⢠⠎⠀⠀⠀⠀⠀⠀⠘⢆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⠀⢠⠃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠳⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⢀⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠀⡎⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀│ 
      │⠼⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠬⢦⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⢤│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠇│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡎⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡜⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠱⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡞⠀⠀⠀⠀│ 
      │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠘⢆⠀⠀⠀⠀⠀⠀⢠⠎⠀⠀⠀⠀⠀│ 
   -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⢄⣀⣀⣀⠔⠁⠀⠀⠀⠀⠀⠀│ 
      └────────────────────────────────────────┘ 
      0                                      100
  0.002988 seconds (6.62 k allocations: 303.188 KiB)

which shows that both the println("hello, world") and plot_unicode_sine() functions have some compilation happening on their first call. Is this the expected result? Shouldn’t have PackageCompiler taken care of all the precompilation of all the called elements of the resulting executable/shared library?

You need a Snoopfile that calls all your functions and cache the compiled code to package afterwards

# snoop.jl
using Hello
Hello.julia_main(String[])

And then

PackageCompiler.build_executable(...., snoopfile="path/to/snoop.jl")

But anyway, i could never get the same performance from an executable as if i was executing from a REPL session, REPL session cached compiled code is always much faster, PackageCompiler is useful but not production ready.

You can use the SnoopCompile package to generate additional snoop files.

I have two toy CLI tools in Julia
InstagramScraper and FileCLI that i wrote just for fun and experimentation, check ./compile.jl. Btw, i use my fork of ApplicationBuilder to generate the Binaries.

Anyway, i had to write all my toy/not toy CLI tools in C# and some in C++ because Julia still has big problems in terms of small binary generation.
My final compiled build for InstagramScraper was 400mb while my C# .net core .exe bundle was just 10mb, both standalone.

Julia is not very versatile yet.

1 Like

I tried it, and I got the same results as before: the first call to any of the functions still takes time in compiling whatever hasn’t been compiled yet. Could it be that the snooping does not actually catch every called function?

I’m not sure what could be the issue with the precompile step with the snoopfile, but i’ve always noticed that the function precompile(method, (Types...)) doesn’t do what i thought it did, also, the cache gathered from the Snoop file is never the same as the one used by a REPL session, it is just slower.

I feel it is a very terrible idea to make CLI tools in Julia at the moment.

Yes, I am starting to realize that aswell. It’s kind of sad to not have a proper static compilation pipeline in Julia that would allow it to be used as a more generic programming language, giving its amazing features and syntax!

Getting a bit of topic, but have you experimented with --compile=min? For one-time scripts like a CLI that can be much faster, see e.g. the time comparisons here: `jlpkg` -- a command line interface to Pkg

1 Like

Yes, I did try it for CLI stuff, but I am aiming to build fullly compiled applications, and I was trying PackageCompiler to build some simple tools to see if it could be useful to compile more complex things using Julia as the only programming language.

It was a nice trick you used there, thanks for showing that, i didn’t know about that flag, it is great for CLI that interact directly with the Julia runtime and depends on Julia being installed on the system.

But doesn’t solve the issue for a standalone executable that can be distributed and installed anywhere.

With regards to the extra compilation being done, maybe pass --compile=no to Julia running your custom sysimage. You should get an error when Julia tries to JIT a function that wasn’t precompiled. This should give us a hint as to what functions got missed.

The problem is that even with apparently simple functions, a lot of "code missing for..." errors would be printed out. Caching all of them into precompile statements and recompiling the system image looks like a workaround to a missing “official” way of building fully compiled executables, which I think is a big drawback for the adoption of the language (at least, among the people I know).

In any case, does anyone know if the feature of building fully compiled binaries is been worked on?

PackageCompiler.jl is apparently the official package for this task, if something is going to happen, like tree-shaking algorithms and those kind of stuff, it is going to happen there, but unfortunately, i’m not sure this is their focus at the moment, the developers are well aware of this query thou.

These are exactly what I’m saying we need to fix: if these functions were AOT compiled into your sysimage properly, you wouldn’t get those errors anymore (my naive assumption).

Yes absolutely, but the problem is that there are less people working on fixing this than there are people who are asking for this functionality. Thankfully, there are a lot of low-hanging fruit (in my opinion) that are easy for new contributors to cut their teeth on. Given that I’m seeing plenty of enthusiasm in this thread, I’ll list some open issues that we all can work on:

  • Remove unneeded stdlibs from base/sysimg.jl - The set of stdlibs included in the sysimage are listed here, and used during the build process to compile in support for stdlibs, which certain programs might not need (and removing them can save a big chunk of disk and memory usage). A PR to Julia to provide the option to remove these stdlibs at the user’s discretion would probably be a good start; alternatively, one could hack this support directly into PackageCompiler by swapping in a custom-written base/sysimg.jl to achieve the same effect.
  • Certain core functionality may not be required - Add options to Julia’s build process that allow one to safely remove things like BLAS, and help fix the src/anticodegen.c file so that building with JULIACODEGEN=0 in Make.user doesn’t break the build (which would remove runtime compiler support via the very hefty LLVM).
  • Prototype some simple AOT codegen with LLVM in the CUDAnative style - CUDAnative has shown us that it’s possibly to AOT compile static kernels from within Julia for usage on the GPU. Doing native (CPU) codegen via the same approach, maybe with some tweaks for the specific program, would be an interesting avenue to explore, especially for smaller, simpler programs where dynamic dispatch might not be prevalent. This is pretty close to what “tree shaking” means, as you’re only compiling static code paths that are explicitly requested (and you can selectively remove things like exception machinery if you so desire).
  • Fix SnoopCompile/Julia to catch more method signatures - As stated above
  • Make things in the sysimage more compact - There’s certainly going to be holes or waste in the sysimage that are consistently unused, maybe due to overzealous padding or code/data duplication. Doing some spelunking into how Julia serializes and deserializes the sysimage should provide some insight into where things can be squished closer together to take up less room overall.
  • Lazy-load parts of the sysimage - Currently, it appears that the entirety of the sysimage is loaded into memory at julia_init, which is probably not strictly necessary. Investigating ways to lazy-load chunks of the sysimage as-needed would reduce memory usage and potentially startup time.
9 Likes

Just found this thread when running into similar problems. For me, using a snoop file which simply wraps the julia_main call worked. However, in both versions (with and without snoop file), there is an initial phase of about 6 seconds, during which nothing visible happens at all, then the 0.000000 seconds shows from the @time a = 10 statement.

Any idea where this “delay” comes from? There’s no such long delay when starting Julia… (BTW I tested this on Windows 8.1).

Edit: I just ran compile_incremental using the same snoop file (and a TOML with UnicodePlots as only dep), then executed julia --sysimage <path/to/incremented/image> <snoopfile>.jl and it has the usual almost negligible Julia startup time (some 0.2s or so) and then the same performance as the standalone executable. So it seems that the standalone executable does init-stuff which Julia does not…?

You should definitely file an issue, but unless you’re willing to investigate this yourself, I don’t think it will be addressed quickly. The maintainers are already extremely overloaded (and there are 95 open issues filed on the repo).

I’ll try the same at home, under Linux. Depending on the outcome, the starting point for investigating this behaviour would probably be different (OS-specific or not).

Edit: Tried it. compile_incremental works fine, but build_executable fails with

/home/linuxbrew/.linuxbrew/bin/ld: //home/asprionj/Documents/PortableSW/julia-1.1.0/lib/julia/libLLVM-6.0.so: undefined reference to `std::thread::_State::~_State()@GLIBCXX_3.4.22'
/home/linuxbrew/.linuxbrew/bin/ld: //home/asprionj/Documents/PortableSW/julia-1.1.0/lib/julia/libLLVM-6.0.so: undefined reference to `std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)())@GLIBCXX_3.4.22'
collect2: error: ld returned 1 exit status

So, cannot compare it right now.