Even after PackageCompile, the binary still runs slower than in REPL

I am trying to find a way to finally deploy the program I have written in Julia. And having a binary seems to be the way to go, since I don’t expect the user to have Julia installed.

So I use PackageCompiler.jl. I compiled the hello.jl example from the package example directory. The code looks like this:

module Hello

using UnicodePlots

Base.@ccallable function julia_main(ARGS::Vector{String})::Cint
    println("hello, world")
    @show sin(0.0)
    println(lineplot(1:100, sin.(range(0, stop=2Ο€, length=100))))
    return 0
end

end

The compilation took almost 2 minutes.

real    1m50.334s
user    1m54.636s
sys     0m2.404s

Well, if the binary produced runs as fast as in REPL, I can tolerate it.
But I timed the binary, it took

real    0m0.650s
user    0m0.663s
sys     0m0.163s

I time the same function in REPL, with slight modification:

julia> function main()
            println("hello, world")
            @show sin(0.0)
            println(lineplot(1:100, sin.(range(0, stop=2Ο€, length=100))))
       end
main (generic function with 1 method)

It took only 0.02 seconds

Julia> @time main()
0.025196 seconds (12.68 k allocations: 528.891 KiB)

My final question, why is the binary slower than function in REPL? Is there any way to get the same speed as running the function in REPL?

Thank you in advance!

It looks like you might have accidentally posted this before you finished writing your question. Could you elaborate a little on what problem you’re having? Also, FYI, you might want to keep an eye on PackageCompilerX. It hasn’t been officially released yet but people are starting to try it out and it seems (without knowing any of the key players) that it is the future of ahead of time compiled julia programs.

Thank you for answering my question before I finish :). Yes its my finger going autopilot and hit enter before the question is finished. I will look into PackageCompilerX and will report back what I find. Thanks

I think I can answer your original question but I’m not an expert in the julia compilation process so don’t take my answer as the final word on the topic. Anyway, I don’t think you should expect your compiled executable to run as quickly as calling the function from the REPL a second time after it’s already been JIT’ted. When you run compiled executable the OS needs to load the julia runtime and then that can run your compiled julia program. So the more relevant comparison is the following:

Let’s say you have a file called tst.jl with the following script:

using UnicodePlots
function main()
            println("hello, world")
            @show sin(0.0)
            println(lineplot(1:100, sin.(range(0, stop=2Ο€, length=100))))
end
main()

On my desktop:

julia> versioninfo()
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I get the following when I run that script from the command line:

~ /usr/bin/time -p julia tst.jl
hello, world
sin(0.0) = 0.0
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
    1 β”‚β €β €β €β €β €β €β €β‘ β Šβ ‰β ‰β ‰β ’β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β €β €β €β €β’ β Žβ €β €β €β €β €β €β ˜β’†β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β €β €β €β’ β ƒβ €β €β €β €β €β €β €β €β €β ³β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β €β €β’ β ƒβ €β €β €β €β €β €β €β €β €β €β €β ±β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β €β’ β ƒβ €β €β €β €β €β €β €β €β €β €β €β €β €β ³β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β’€β ‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β’£β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β €β‘Žβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β’‡β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚ 
      β”‚β Όβ €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ¬β’¦β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β’€β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ˆβ‘†β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β’€β ‡β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ˜β‘„β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β‘Žβ €β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ±β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β‘žβ €β €β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ±β‘€β €β €β €β €β €β €β €β €β €β €β €β‘œβ €β €β €β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ±β‘€β €β €β €β €β €β €β €β €β €β‘žβ €β €β €β €β”‚ 
      β”‚β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β ˜β’†β €β €β €β €β €β €β’ β Žβ €β €β €β €β €β”‚ 
   -1 │⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠑⒄⣀⣀⣀⠔⠁⠀⠀⠀⠀⠀⠀│ 
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 
      0                                      100
real 1.53
user 1.74
sys 0.34

That 1.53 seconds includes the time needed to launch julia, parse the script, then compile and run it.

Then when I compile the PackageCompiler example from your original post (which does the same thing as the test script above of course) and run it from the command line I get the following times:

real 0.48
user 0.72
sys 0.30

The real time for compiled program is about 30% of the real time needed to run the script from the command line.

I haven’t tried PackageCompilerX yetβ€”I’ve been meaning to! I’m not sure if the story will be much different there.

1 Like

This looks unreasonably fast. Are you sure you didn’t already run this in the REPL session?
This is for me:

julia> @time main()
hello, world
sin(0.0) = 0.0
...
  1.639137 seconds (4.01 M allocations: 196.461 MiB, 11.49% gc time)

Regarding PackageCompiler, unless you prepare the sysimage with β€œprecompile statements” (https://kristofferc.github.io/PackageCompilerX.jl/dev/devdocs/sysimages_part_1/#Recording-precompile-statements-1) you will still have a compile cost for the functions, first time they are called.

2 Likes

The timing is the second run, because this is the speed after compilation. I am trying to achieve the same amazing speed by AOT compiled binary.

Is the speed of second run in REPL achievable from AOT compilation + sysimage? I guess I have to time it myself.

With a precompile file I get:

❯ time AppCompile/bin/UnicApp;
hello, world
sin(0.0) = 0.0
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    1 β”‚β €β €β €β €β €β €β €β‘ β Šβ ‰β ‰β ‰β ’β‘€β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β €β”‚
...
AppCompile/bin/UnicApp  0.11s user 0.05s system 127% cpu 0.128 total

In your REPL example, you don’t measure starting Julia itself which is relevant.

❯ time julia -e ''
julia -e ''  0.06s user 0.03s system 107% cpu 0.085 total

I have followed the steps in the manual of PackageCompilerX. Here is my timing:
I created a custom image that included β€œUnicodePlots” and precompiled functions of hello.jl. This custom image is called hellosys.dylib

  1. Calling Julia with this custom image and run hello.jl runs about as long as binary.
$ time julia -Jhellosys.dylib hello.jl
real    0m0.651s
user    0m0.757s
sys     0m0.129s
  1. Start Julia with hellosys.dylib, and run hello.jl first time took less time, but still, the second run is much faster than first run in REPL:
$ Julia -Jhellosys.dylib
julia> @time include("hello.jl")
0.476735 seconds (1.87 M allocations: 94.582 MiB, 13.91% gc time)
julia> @time include("hello.jl")
0.048675 seconds (98.81 k allocations: 5.153 MiB)

Is there any way to get close to 0.04 second?

I think what takes most of the time is the using statement:

ufechner@tuxedo:~$ time julia -e "using UnicodePlots"

real	0m0,523s
user	0m0,633s
sys	0m0,198s
ufechner@tuxedo:~$ time julia -e ""

real	0m0,164s
user	0m0,135s
sys	0m0,046s

As far as I understand this is the time needed for type interference, not for compilation. No idea to which degree and how this time can be reduced with PackageCompilerX.

But getting below the time needed for loading the Julia runtime is probably not possible. The good news is that this time decreased in the last years quite lot, but to get down to 50ms seams like a big challenge to me.

I am thinking out of the box, maybe starting from calling Julia, using packages, to running the code, making this timing go down to 0.02 second is very challenging. But how about just have a Julia process running, and having a command in terminal to send the code to the existing Julia process to run. This will make running the command β€œfeel” like it is only running for 0.02 second. Kinda like in a Jupyter notebook. This sounds possible, but I am not sure how to construct such a process.

If you are interested in how package compilation works I suggest to read through https://kristofferc.github.io/PackageCompilerX.jl/dev/devdocs/intro/. Without knowing what is going on it is easy to mess up and time something on the wrong way and thereby drawing misleading conclusions.

1 Like