[ANN] Accouncing LazyModules.jl: delay the heavy dependency loading to its first use

JohnnyChen94 · May 11, 2022, 3:00pm

Disclamer:

This package is not for end-users; it’s for package developers.
This package does not really make package loading faster; it only delays the package loading to its first use. But sometimes, you never get a chance to use it.

Here I take Plots as an example, but it applies to any heavy package. This isn’t something that is unversally useful, but for certain applications, it provides a chance to cut off a lot of latencies because your users don’t need to use all the features.

~~This is still under the registration process but I can’t wait to shout it out here!~~

The first question: How fast would you expect using Plots to be? Is 0.5s something practical? – Maybe, but when you don’t use Plots, 0s is the fastest.

But why should you using Plots when you don’t use it? Another good question: because for package devlopers, they want to provide all the features, while some features might only be used by <1% of the users but have a large set of dependencies.

This is where LazyModules.jl become useful: what you need to do is add the small @lazy macro before your normal import command.

The following example can be found in the examples/ folder:

module MyLazyPkg

export generate_data, draw_figure
+using LazyModules
-import Plots
+@lazy import Plots

generate_data(n) = sin.(range(start=0, stop=5, length=n) .+ 0.1.*rand(n))
draw_figure(data) = Plots.plot(data, title="MyPkg Plot")

end

Now…

julia> @time using MyLazyPkg # 🚀🚀🚀🚀🚀
  0.053273 seconds (154.16 k allocations: 8.423 MiB, 97.62% compilation time)

julia> x = @time generate_data(100); # 🚀
  0.000006 seconds (2 allocations: 1.750 KiB)

But when you do the first plot, Plots gets loaded and it’s still slow:

julia> @time draw_figure(x) # 💤💤
  4.454738 seconds (13.82 M allocations: 897.071 MiB, 8.81% gc time, 49.97% compilation time)

Here 4.4s is the Plots loading time plus the plot TTFX time.

Caveats

This isn’t a real module; it is just a plain struct with getproperty overrided to mimic common module usage. Because of this, there are a few cases not supported. For instance, parametrized constructors are not supported.

julia> using LazyModules
[ Info: Precompiling LazyModules [8cdb02fc-e678-4876-92c5-9defec4f444e]

julia> @lazy import ImageCore as LazyImageCore
LazyModule(ImageCore)

julia> LazyImageCore.RGB(0.0, 0.0, 0.0)
RGB{Float64}(0.0,0.0,0.0)

julia> LazyImageCore.RGB{Float64}(0.0, 0.0, 0.0)
ERROR: TypeError: in Type{...} expression, expected UnionAll, got a value of type LazyModules.var"#f#4"{LazyModules.var"#f#3#5"{Symbol, Module}}
Stacktrace:
 [1] top-level scope
   @ REPL[5]:1

Also this introduces a few overhead ~80ns per call, so don’t use it for trivial functions.

IMPORTANT: You’ll still need to eagerly load the core packages to ensure the caller can directly work on the function output without hitting the world age issues. See discussion below.

marius311 · May 11, 2022, 5:00pm

I’ve really wanted something like this too, but one problem with this approach is that while your @lazy macro can invokelatest the decorated function call, its not as easy to do so for downstream objects returned by the function. E.g. this errors:

using LazyModules
@lazy import ComponentArrays

function foo()
    x = ComponentArrays.ComponentVector(x=1)
    2 * x
end

foo() # error because 2 * ComponentArray is too new

At one point I made a similar macro called @dynamic (see here, not in a package yet, or maybe ever) which requires you to put the import in the function, and invokelatests the entire function “from the beginning”, with some hacks to figure out how to re-call the running function with the right arguments. It looks like:

using MyHypotheticalDynamicImportPackage

function foo()
    @dynamic import ComponentArrays
    x = ComponentArrays.ComponentVector(x=1)
    2 * x
end

Foo.foo() # works now

which basically expands to

function foo()
    if !is_already_loaded(ComponentArrays)
        @eval import ComponentArrays
        return invokelatest(foo)
    end
    x = ComponentArrays.ComponentVector(x=1)
    2 * x
end

This has some issues too though because anything above the @dynamic will get called twice, there were some subtleties with closures, and the overhead is way worse than yours.

Anyway, yours definitely fills a certain use-case, although I’m still curious if anyone can think of a more robust solution than either of these two approaches.

JohnnyChen94 · May 11, 2022, 11:59pm

Oh yes, I knew there’re still some world-age issues here but forgot to mention it, thanks for pointing it out!
This is also why this trick should not be used by users directly.

The world age issue mainly occurs when you didn’t load the “core” packages that has a bunch of methods defined for the basic computation. Thus the package still has to “eagerly” load some core dependencies to ensure it’s on an up-to-date working world age with necessary types and methods loaded.

We use this trick in ImageIO so that the actual backend is only loaded when certain image formats is used. In LazyModules.jl words, it is:

Here ImageCore is the core packages that is always loaded by ImageIO by the normal using ImageCore so that Colorant and releated methods exist in the current world age. ImageIO also introduces a so-called enforce_canonical_type to ensure that no alien array types are returned to users and thus avoid the world-age issue.

CarloLucibello · May 12, 2022, 6:00am

An alternative that avoids (all?) world age issues also when working with downstream objects involves a manual intervention of the user who has to explicitly import the required package.

An example is the following

module Foo
    using LazyModules: @require

    function foo()
        @require import ComponentArrays
        x = ComponentArrays.ComponentVector(x=1)
        return 2 * x
    end
end

using ComponentArrays # foo() will error out without this 
Foo.foo()

where the @require import ComponentArrays is expanded to

        pkgid = Base.identify_package("ComponentArrays")
        if Base.root_module_exists(pkgid)
            ComponentArrays = Base.root_module(pkgid)
        else
            error("Add `import ComponentArrays` or `using ComponentArrays` to your code
                   to unlock this functionality.")
        end

Now this is very similar to what Requires.jl does, but it has a couple of advantages for the use cases I have in mind:

The package Foo.jl can version and apply compat bounds to ComponentArrays.jl
Foo code is not conditionally loaded and we don’t need some barriers for include.
The change for Foo’s developer is minimally invasive.
Users are directed with a simple message to what they need to do

A disadvantage I see is that we cannot dispatch in Foo based on types from ComponentArrays.

JohnnyChen94 · May 12, 2022, 8:48am

I like the “to unlock this functionality” error hint. But it now don’t really hide the dependencies from users and I don’t think it is a good direction.

Take MLDatasets as an instance, as a user I expect to pkg> add MLDatasets and that’s all. Now if we go this approach, then when I try MLDatasets, I get to know that I’ll need to add some other package(s) from the error messages, which adds a few more burden to the users. The world-age issue, on the other hand, if handled well, will be transparent to our users.

CarloLucibello · May 12, 2022, 12:18pm

For the MLDatasets case (xref lazy module loading by CarloLucibello · Pull Request #128 · JuliaML/MLDatasets.jl · GitHub)
the problem is that some datasets need to load (possibly lazily) DataFrames.jl and then perform some operations on dataframes. With the LazyModules.jl approach that incurs in world age issues, so in that case I prefer the approach I outlined above.

JohnnyChen94 · May 12, 2022, 12:57pm

the problem is that some datasets need to load (possibly lazily) DataFrames.jl and then perform some operations on dataframes.

I’m not convinced that this is a good direction to make DataFrames and ImageCore lazily loaded. A better direction IMO is to separate out two smaller packages: MLVisionDataSets and MLTableDataSets. If users care about load latency, he should just use the smaller one.

Option 1: using MLTableDataSets
Option 2: using MLDataSets, get “friendly” error hint, then using DataFrames, and do the same function call again. One might also need to leave some comment like # to use xxx functionality, using DataFrames is required here so that others don’t mess up it again.

Which one is better? As a user I prefer the first one.

But, even if you insist to do it, you don’t need a macro for this; it doesn’t make things nicer than a plain function.

julia> function require_module(m::Symbol)
           pkgid = Base.identify_package(string(m))
           if Base.root_module_exists(pkgid)
               return true
           else
               error("Add `import ComponentArrays` or `using ComponentArrays` to your code
                      to unlock this functionality.")
           end
       end
require_module (generic function with 1 method)

julia> function my_advanced_feature()
           require_module(:OffsetArrays)
           return OffsetArrays.OffsetArray(rand(4, 4), -1, -1)
       end
my_advanced_feature (generic function with 1 method)

julia> my_advanced_feature()
ERROR: Add `import ComponentArrays` or `using ComponentArrays` to your code
               to unlock this functionality.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] require_module(m::Symbol)
   @ Main ./REPL[1]:6
 [3] my_advanced_feature()
   @ Main ./REPL[2]:2
 [4] top-level scope
   @ REPL[3]:1

julia> using OffsetArrays

julia> my_advanced_feature()
4×4 OffsetArray(::Matrix{Float64}, 0:3, 0:3) with eltype Float64 with indices 0:3×0:3:
...

CarloLucibello · May 12, 2022, 1:09pm

This is a bit derailing the thread, but I really don’t want to split MLDatasets.jl, very few people contribute to it already and I don’t want to increase the maintenance burden.

As for the why you need a macro, it is because if my_advanced_feature is inside a module, it won’t have access to the package imported outside (try with the Foo example above).

JohnnyChen94 · May 12, 2022, 1:15pm

This works:

function require_module(m::Symbol)
    pkgid = Base.identify_package(string(m))
    if Base.root_module_exists(pkgid)
-        return true
+        return Base.root_module(pkgid)
    else
        error("Add `import OffsetArrays` or `using OffsetArrays` to your code
               to unlock this functionality.")
    end
end

function my_advanced_feature()
-    require_module(:OffsetArrays)
+    OffsetArrays = require_module(:OffsetArrays)
    return OffsetArrays.OffsetArray(rand(4, 4), -1, -1)
end

Topic		Replies	Views
Low-overhead conditional loading of external modules General Usage question , package-extensions	3	296	May 29, 2023
Import a module only when needed Performance	5	638	May 8, 2022
Slowdown _after_ package load New to Julia	18	1166	February 4, 2020
How to delay the load of dependencies in a module to "using" time? General Usage question	4	266	November 27, 2022
Optional dependencies / Requires.jl Internals & Design	38	7698	June 9, 2017

[ANN] Accouncing LazyModules.jl: delay the heavy dependency loading to its first use

Caveats

Related topics