Import a module only when needed

As a strategy for reducing using SomePkg latency, is it possible for a package to import a module in a lazy way, only when some functionality is required?
Something similar to what Requires.jl, but where the user doesn’t have to perform the import by itself.
A failed attempt is the following:

julia> function read_csv(path)
         @eval using CSV, DataFrames
         return CSV.read(path, DataFrame)
       end
read_csv (generic function with 1 method)

julia> CSV # module not available
ERROR: UndefVarError: CSV not defined

julia> read_csv("test.csv")
ERROR: MethodError: no method matching read(::String, ::Type{DataFrame})
You may have intended to import Base.read
The applicable method may be too new: running in world age 31326, while current world is 31351.
Closest candidates are:
  read(::Any, ::Any; copycols, kwargs...) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87 (method too new to be called from this world context.)
  read(::Any) at ~/.julia/packages/CSV/jFiCn/src/CSV.jl:87 (method too new to be called from this world context.)
Stacktrace:
 [1] read_csv(path::String)
   @ Main ./REPL[1]:3
 [2] top-level scope
   @ REPL[4]:1

julia> CSV.read("test.csv", DataFrame) # CSV has been imported
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

This seems to work instead:

julia> function read_csv(path)
         @eval begin 
           using CSV, DataFrames
           return CSV.read($path, DataFrame)
         end
       end
read_csv (generic function with 1 method)

julia> @time read_csv("test.csv")
 17.529090 seconds (26.21 M allocations: 1.410 GiB, 5.87% gc time, 90.62% compilation time)
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

julia> @time read_csv("test.csv")
  0.000993 seconds (453 allocations: 37.328 KiB)
1×2 DataFrame
 Row │ a       b    
     │ Int64  Int64 
─────┼──────────────
   1 │     1      2

Any counterindications?

1 Like

Calling eval (or the macro @eval) inside a function is not generally a good idea. The code evaluated that way is executed as a “top-level” expression. That means that: (1) it is not compiled as usual code in a function, so it has poor performance, and (2) it changes the global state of the module.

The last “problem” is actually what you want… or perhaps not… By using CSV, DataFrames, you are also exposing all their exported objects on the global scope, and that might cause conflicts. For that kind of things is better let the user be conscious of what packages are being used (as with Requires.jl)

2 Likes

I often do not recommend abusing eval but this seems relatively harmless. Just be very aware that your code inside @eval begin ... end always run in global scope. So avoid doing anything that would pollute it (like creating new variables), and also remember that CSV possibly will not be available for other functions called after unless: (a) they also use eval; (b) they employ invokelatest; (c) the other functions are called only after the current call stack return to the global scope and continues from there.

3 Likes

Another thing to consider: you comment that your purpose of “lazy using” in to reduce latency, but that only happens if those packages are not actually used. Otherwise you are just delaying the “problem”, and perhaps making it worse. As commented in this nice article, one of the important sources of latency is method invalidation, and it is advised that:

3 Likes

My specific use case is for MLDatasets.jl where a single user most likely doesn’t need a large part of the I/O packages imported by the library.

Related issue using MLDatasets is very slow · Issue #126 · JuliaML/MLDatasets.jl · GitHub. In particular, the comments pointing at FileIO/ImageIO conditionally loading mechanism.

1 Like