Questions about Juliac

At the moment, I am trying to use julia for a real-time system. As of so far, I have done a bunch of benchmarking for julia.

So the script below attempts to measure the initial time to run matrix multiplication for two 20x20 matrices and later averages the time it takes for 100 matrix multiplications to run. The attached script shows the julia file that is compiled. For the native julia script, main() is added to the bottom.

With the juliac compilation, the overall script takes a much shorter time (~0.2s) compared to the native julia (~2s). However, in juliac, the first matrix multiply takes 6 ms, while the native julia takes around 100 us.

I was wondering what could be the cause of the first matrix multiply taking longer the executable compiled by juliac.

matmul.jl (674 Bytes)

2 Likes

Some suggestions for your benchmark, doesn’t explain the 60x timing discrepancy though:

  • Share your exact compilation and timing commands with any resulting printouts as a better summary. What you’ve done or are referring to are not clear enough in your summary. The printouts and your code are short enough to directly share in formatted blocks here even if you need to collapse some with “Hide Details” blocks; that’s more preferable than links or downloads.
  • use time_ns (UInt64 nanoseconds, :jl_hrtime on v1.11.5) instead of time (Float64 seconds, :jl_clock_now on v1.11.5). I’m not sure, but my understanding is the resolution is usually better across systems, and that’s what benchmarking methods tend to use.
  • You are very likely not measuring “Time with compilation” of matmul(A, B). By the time test() executes, it and its statically dispatched callees like matmul(A::Matrix{Float64}, B::Matrix{Float64}) were already compiled under normal circumstances. Even when running @time in the global scope instead of inside a method, it’s possible for the compiler to sneak in and compile some of the input expression, which is why it’s documented to sometimes try @time @eval to wrap the input expression in another expression to deter the compiler.
  • I’m not sure if this matters for your numbers, but timing one fast call at a time (and 100us seems pretty fast) is known to risk poorer accuracy; for example, a microsecond timer may show 3us or 4us for a call, but 3141us for 1000 calls and thus a more accurate 3.141us per call. Benchmarking libraries thus time a number of calls and divide for a sample, among other things. I don’t know how feasible it is to get BenchmarkTools.jl or Chairmarks.jl working in a juliac executable, though, so you might have to set up and time your loop. FWIW, @benchmark $(rand(20,20))*$(rand(20,20)) did “10000 samples with 175 evaluations per sample” once on my system, which were 150us samples given the median timing. You can check what to aim for on your system.
2 Likes