Why does the speed of Julia in AOT compilation differ from UX4?

As the following picture shows, there are two different Julia versions in the speed comparison. The AOT compiled version reaches speeds as fast as Fortran, etc. However, the other version is slower and closer to Python.

I have a couple of questions:

  1. Could anyone explain what ux4 mean?
  2. For ordinary users, if they follow the guidance of Performance Tips, which version of speed are they most likely to achieve? And how fast?

picture source: Latest - Speed comparison

1 Like

This is a question you should pose to whoever made that “speed comparison”. In general Julia can be as fast as C++ or Rust, and faster than C. Keep in mind good performance may require effort, in any language.

Those two data points seem to be comparing two different implementations, one in leibniz.jl and one in leibniz_ux4.jl. The latter unrolls the inner loop 4x (which reduces overhead).

However, it it looks like they are making the usual mistake of including the startup and compilation time for Julia in the benchmark. i.e. just timing julia script.jl from the shell.

So, basically the results are misleading for how the performance would be in a more realistic program that does heavy number crunching (since you only care about the performance if it takes much longer than a second to run, and in this case the one-time compilation overhead is negligible). It’s also a bit unfair to compare it to AOT compiled languages like C, since you’re not including the time to run the compiler for those language.

This also makes the benefit of loop unrolling minimal, since they are mostly measuring startup time.

This comes up every single time people do a cross-language benchmark, and it’s a bit tiring to address over-and-over again.

6 Likes

This is how the two benchmarks are executed.

julia:
  # We have to use a special image since there is no Julia package on alpine 🤷‍♂️
  FROM julia:1.8.2-alpine3.16
  DO +PREPARE_ALPINE
  DO +ADD_FILES --src="leibniz.jl"
  DO +BENCH --name="julia" --lang="Julia" --version="julia --version" --cmd="julia leibniz.jl"

julia-compiled:
  # We need the Debian version otherwise the build doesn't work
  FROM julia:1.8.2
  DO +PREPARE_DEBIAN
  RUN apt-get update && apt-get install -y gcc g++ build-essential cmake
  DO +ADD_FILES --src="leibniz_compiled.jl"
  COPY ./src/leibniz.jl ./
  RUN julia -e 'using Pkg; Pkg.add(["StaticCompiler", "StaticTools"]); using StaticCompiler, StaticTools; include("./leibniz_compiled.jl"); compile_executable(mainjl, (), "./")'
  DO +BENCH --name="julia-compiled" --lang="Julia (AOT compiled)" --version="julia --version" --cmd="./mainjl"
2 Likes

Oh, I see that there is a “Julia (AOT Compiled)” line, which removes some of the startup cost.

Yes, but it uses StaticCompiler.jl which is an unrealistic scenario for most users.

Most realistic would be either a pre-compiled package, or a program compiled with PackageCompiler.

3 Likes

And in any case, why include misleading results (startup-dominated timings) at all in the benchmark? I filed an issue suggesting that they should really just measure the compute time within each language’s code.

But this shows up over and over; it seems kind of pointless to try to correct every random amateur benchmarking attempt.

2 Likes

:100: this

Relevant JuliaCon talk about this benchmark:

2 Likes

The AOT compiled version is as fast as Fortran. A realistic question is: for ordinary users, how could the speed of the code that not utilize StaticCompiler. jl close to the AOT version?

The benchmark is not measuring the speed of the numerical code in the non-AOT case, it is mostly measuring startup time. And that is totally an artificial consequence of the fact that the benchmark is so short (a fraction of a second), so that startup time dominates.

The actual numerical calculation will be the same speed in the AOT and non-AOT cases.

For ordinary users, if you care about numerical performance, it’s probably for code that takes more than a few seconds to run. In which case startup time is irrelevant

6 Likes

Thank you and everyone else for the reply. :pray:

Depends on what the user is doing. Are they repeatedly running a script from the command line, thus reloading and recompiling the sysimage and the script’s code and imports? Then the startup time is going to add up. Does the script have a parameter for more iterations or are they working within a session? Then the startup time only needs to happen once. When people argue about what’s fair to include in a benchmark, they’re often arguing over workflow like this. The AOT benchmark shows that when the startup and overhead for Julia’s interactivity is omitted, the particular algorithm runs as fast as other compiled languages.

Ordinary Julia users are repeating calls in the Julia session, not using StaticCompiler to make an minimal executable. StaticCompiler trims so much overhead that many Julia features can’t be supported. juliac (development for v1.12 recently announced) and SyslabCC (proprietary, usable) aim to support more of Julia, but there is a fundamental feature-overhead tradeoff.

3 Likes