Import a module only when needed

CarloLucibello · May 7, 2022, 8:13am

As a strategy for reducing using SomePkg latency, is it possible for a package to import a module in a lazy way, only when some functionality is required?
Something similar to what Requires.jl, but where the user doesn’t have to perform the import by itself.
A failed attempt is the following:

julia> function read_csv(path)
         @eval using CSV, DataFrames
         return CSV.read(path, DataFrame)
       end
read_csv (generic function with 1 method)

julia> CSV # module not available
ERROR: UndefVarError: CSV not defined

julia> read_csv("test.csv")
ERROR: MethodError: no method matching read(::String, ::Type{DataFrame})
You may have intended to import Base.read
The applicable method may be too new: running in world age 31326, while current world is 31351.
Closest candidates are:
  read(::Any, ::Any; copycols, kwargs...) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87 (method too new to be called from this world context.)
  read(::Any) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87 (method too new to be called from this world context.)
Stacktrace:
 [1] read_csv(path::String)
   @ Main ./REPL[1]:3
 [2] top-level scope
   @ REPL[4]:1

julia> CSV.read("test.csv", DataFrame) # CSV has been imported
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

CarloLucibello · May 7, 2022, 8:17am

This seems to work instead:

julia> function read_csv(path)
         @eval begin 
           using CSV, DataFrames
           return CSV.read($path, DataFrame)
         end
       end
read_csv (generic function with 1 method)

julia> @time read_csv("test.csv")
 17.529090 seconds (26.21 M allocations: 1.410 GiB, 5.87% gc time, 90.62% compilation time)
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

julia> @time read_csv("test.csv")
  0.000993 seconds (453 allocations: 37.328 KiB)
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

Any counterindications?

heliosdrm · May 7, 2022, 1:54pm

Calling eval (or the macro @eval) inside a function is not generally a good idea. The code evaluated that way is executed as a “top-level” expression. That means that: (1) it is not compiled as usual code in a function, so it has poor performance, and (2) it changes the global state of the module.

The last “problem” is actually what you want… or perhaps not… By using CSV, DataFrames, you are also exposing all their exported objects on the global scope, and that might cause conflicts. For that kind of things is better let the user be conscious of what packages are being used (as with Requires.jl)

Henrique_Becker · May 7, 2022, 2:00pm

I often do not recommend abusing eval but this seems relatively harmless. Just be very aware that your code inside @eval begin ... end always run in global scope. So avoid doing anything that would pollute it (like creating new variables), and also remember that CSV possibly will not be available for other functions called after unless: (a) they also use eval; (b) they employ invokelatest; (c) the other functions are called only after the current call stack return to the global scope and continues from there.

heliosdrm · May 7, 2022, 2:12pm

Another thing to consider: you comment that your purpose of “lazy using” in to reduce latency, but that only happens if those packages are not actually used. Otherwise you are just delaying the “problem”, and perhaps making it worse. As commented in this nice article, one of the important sources of latency is method invalidation, and it is advised that:

CarloLucibello · May 8, 2022, 11:34am

My specific use case is for MLDatasets.jl where a single user most likely doesn’t need a large part of the I/O packages imported by the library.

Related issue using MLDatasets is very slow · Issue #126 · JuliaML/MLDatasets.jl · GitHub. In particular, the comments pointing at FileIO/ImageIO conditionally loading mechanism.

Topic		Replies	Views
Is there a mechanism in Julia to load modules on demand? General Usage module	6	772	January 25, 2020
`using` on demand, is it possible? General Usage	1	407	January 16, 2019
Question regarding "WARNING: ignoring conflicting import" General Usage question	3	1004	March 13, 2022
An idea for putting imports inside of Julia functions General Usage	2	362	April 1, 2022
Define methods for packages without importing Performance	1	290	February 15, 2021

Import a module only when needed

Related topics