Taking TTFX seriously: Can we make common packages faster to load and use

I’m talking about the reality of how it works now. By “breaks” precompilation I mean precompiling unstable functions seems to give less or no reduction in TTFX compared to precompiling stable functions.

The basic reason for this is that with type stable code, the compiler can trace all the functions your top level code depends on and precompile all of them, but if the compiler has to trace unstable functions, it loses the ability to know what methodInstances get called which means it can’t precompile the recursive dependencies.

6 Likes

The basic reason for this is that with type stable code, the compiler can trace all the functions your top level code depends on and precompile all of them, but if the compiler has to trace unstable functions, it loses the ability to know what methodInstances get called which means it can’t precompile the recursive dependencies.

I get that it applies to precompile statements, but does it also apply to function calls?

1 Like

My only experience in developing packages is a little toy one I created just to learn more about the process, but I have not yet found the need to develop my own “serious” package yet. So, in a sense, I feel like this thread is directly towards people like myself who will need a good deal of help to avoid common mistakes/programming patterns that increase the TTFX. Rightly or wrongly, reading the comments gives the impression that there is a very high bar to clear.

Others (here and here) have come up with good “precompilation checklists” to go through for reducing start time. But now consider a “knowledge checklist” one may need in order to implement those suggestions (and the others above):

  • how to interpret the built-in profiler,
  • how to recognize type instabilities and understand @code_warntype,
  • how to read a ProfileViews.jl flamegraph,
  • how to use Cthulhu.jl,
  • what causes an “invalidation”,
  • how to use SnoopCompile.jl,
  • how to “properly” use Requires.jl
  • what makes code precompile-able in the first place,

and there are likely others I’ve missed. Some of these are fairly basic (interpreting the profiler and @code_warntype, reading flame graphs), but others may require more understanding.

Looking at the entirety of the thread, what would help the most is an easy-to-follow tutorial, perhaps with a dummy package repo on github, where each step is shown and explained in order of “low-hanging fruit” to “the hard stuff”. Because right now, again rightly or wrongly, I look at this thread and just think “wow, that’s a lot of work”, so anything to make the process seem more accessible would be huge.

17 Likes

I mean the first time a function is called.

Maybe this is stupid but: I noticed that @inferred was never mentioned in the thread. I know that other tools allow for more fine-grained inspection, but shouldn’t @inferred allow to catch at least a few type instabilities?

(I’m asking since I use it all the time)

2 Likes

Inspired by this post on TTFX with CSV/DataFrames, I made a quick attempt at a function to run script files repeatedly.

using Statistics

"""
    ttfx(; code="sleep(1)", N=10, args = "", preview=false)

Compute the time to first X. 

`ttfx` will run a `file` `N` times to determine the total 
startup cost of running certain packages/functions. `ttfx()`
with no arguments will simply run `sleep(1)` and may be used
to estimate the base julia runtime cost.

`code` can either be a short snippet that will run with 
the `-e` switch or a file containing the script to be run.

`args` can be used to set Julia runtime options

`preview = true` will show the final command without running it.

"""
function ttfx(; code="sleep(1)", N=10, args = "", preview=false)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = "-e '$code'"
    end

    # `cmd doesn't interpolate properly or something
    # so using shell_parse`and cmd_gen is the workaround
    ex, = Base.shell_parse("julia $args $code")
    julia_cmd = Base.cmd_gen(eval(ex))

    # Return only the command that would have been run
    preview && return julia_cmd
    
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(`$julia_cmd`)
    end

    return median(times), times
end

# Run the default timing with sleep
t = ttfx()

# Run `using CSV` 15 times in the current project with CSV.jl installed and 8 threads
t = ttfx(code="using CSV", N=15, args="-t 8 --project=@.")

For me ttffx() takes a median time of ~1.17 seconds, so there is a baseline julia runtime cost of 0.17 second on my machine. Then using CSV on my computer takes a median time of 3.4 seconds

Feel free to edit and expand this (perhaps into the @ctime macro that was suggested). Code suggestions welcome!

8 Likes

You can clean up the Cmd stuff by doing it like this:

function ttfx(; code="sleep(1)", N=10, args = String[], preview=false)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = `-e $code`
    end

    julia_cmd = `julia $args $code`

    # Return only the command that would have been run
    preview && return julia_cmd
    
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(julia_cmd)
    end

    return median(times), times
end

In this way, to pass args, do pass a list of strings like ttfx(; args=["-O1","--compile=min"]).

6 Likes

Or ttfx(args=`-O1 --compile=min`) works too.

5 Likes

Thanks! I initially tried using both the backticks `` and @cmd to convert the string to Cmd, but I was getting a strange error from Base.shell_parse, which is the reason for that note in my original function. But for some reason your new version doesn’t give me the same error…

Strings, commands, and arrays all interpolate into commands differently, and when you tried, you probably you had something as a string that should’ve been a command or something like that. I often end up using trial-and-error although I really should just learn the rules. In the end, you should usually not need “manual quoting” like code = "-e '$code'", and definitely should not need eval or Base.shell_parse, etc.

1 Like

Ah, that’s good to know. Trial-and-error is how I ended up with what I had initially.

Heres a minor rework, with a macro:

using Statistics
function ttfx(; code="sleep(1)", N=2, args = String[], preview=false)
    jl_startup = @elapsed run(`julia $args -e ""`)
    # If running a short snippet and not a file, add a -e
    if !isfile(code)
        code = `-e $code`
    end
    julia_cmd = `julia $args $code`
    # Return only the command that would have been run
    preview && return julia_cmd
    # Run the command N times
    times = Vector{Float64}(undef, N)
    for i = 1:N
        times[i] = @elapsed run(julia_cmd)
    end
    times .-= jl_startup
    return (mean=mean(times), times=times)
end
macro ttfx(code::String, N=1)
    ttfx(; code, N)
end

@ttfx "your code here" 10
1 Like

As an update, TTFX for Blink.jl, Interact.jl, JSON.jl, CSV.jl and I would guess many other packages is largely dependent on the compilation time of Parsers.jl

https://github.com/JuliaData/Parsers.jl/pull/108
https://github.com/JuliaIO/JSON.jl/pull/337

2200 dependencies means it’s probably responsible for a good fraction of the compilation time in the ecosytem.

My takeaway is not to fix your a packages precompile time, but the lowest level dependency that is causing the problem. But it’s not always so easy to find where that is, because its buried so deep in the stack and usually doesn’t show up in the flame graph. And to take the top of the @snoopi_deep graph more seriously! I could have found this a lot more quickly in hindsight.

19 Likes

Why would something not show up in the flame graph if it takes such a big chunk of the run time?

While we should definitely continue to strive for lower TTFX, I think we should also clearly advertise the solution to use PackageCompiler to create a sysimage. It appears to me that this is a very practical and low effort method to eliminate TTFX. Of course creating a sysimage introduces a small inconvenience in the development workflow. However, I have found that in a typical project, the set of packages that I use stabilizes pretty early on, so there is no need to create a sysimage frequently. Also, importantly, it works perfectly for application deployment, where no new code is added.

Deeper functions dont show up because they’re taking compile time, not run time. So the flame graph will usually show functions further out that call the slow function, or nested functions that eventually call it.

Blink/Interact (partly via AssetRegustry) will show time in JSON.Parser.parse, and adding precompilation there does help some of the ttfx. But the real problem is in Parsers.jl, 5 method calls deeper. Its only used on one line to parse floats, and AssetRegistry doesn’t read or write floats. So its not exactly what you look for first, unless you know the packages already.

PackageCompiler still takes some knowledge and the compile time problem is most important for the experience of newcomers.

If someone who uses R tries out Julia, they won’t be using package compiler straight away. So loading a tiny CSV and plotting a column from it will take half a minute for them, and Julia won’t feel at all “fast”.

And at the other end, as a developer, package compiler isn’t so useful.

15 Likes

I don’t think custom sysimages should be recommended for general use, before integration with Pkg is improved. See the “drawbacks” mentioned in the page you linked.

With a custom sysimage, as soon as you add a package to your project, you cannot trust Pkg (or the TOML files) anymore about which package versions you are using (you cannot even trust that the versions used are compatible).

It’s OK if you are disciplined and know what you are doing, but if people start using this without second thoughts… Imagine supposedly reproducible publications where it turns out the listed package versions are wrong because of this. This will hurt not only the researchers but also Julia’s reputation.

14 Likes

Ok then you mean the @profile flamegraph but not the @snoopi_deep flamegraph? I thought you were using the latter, it should show those methods I think