Loading packages in a parallel thread

Sometimes there are packages that are not needed in the first steps of a script, so performance can be improved by running those first steps and loading the “secondary” packages in parallel. For instance:

t = Threads.@spawn firststeps()
using Plots # not needed at all for `firststeps`
result = fetch(t)
plot(result)

However, I’m used to using packages first, before running anything, so I feel as if there was something wrong with this. (It’s not possible to do it the other way round: using must be done at top level, and I cannot @spawn it.)

So my question is: is there really something wrong with this approach, or is it ok?
Is it a good idea using packages in parallel with other computations, as long as those package are not yet needed?

It should be fine - as far as I know, spawned tasks run in the world age they’re spawned in, so using on the main thread shouldn’t interfere with already running code as that increases the world age.

See this for a negative example (trying to access something that is only available in the future from the POV of the spawned task):

julia> f() = begin sleep(10); g() end

julia> t = Threads.@spawn f()                                                                      
Task (runnable) @0x00007f6de219ce70                                                                
                                                                                                   
julia> g() = "hello"                                                                               
g (generic function with 1 method)                                                                 
                                                                                                   
julia> fetch(t)                                                                                    
ERROR: TaskFailedException                                                                         
Stacktrace:                                                                                        
 [1] wait                                                                                          
   @ ./task.jl:322 [inlined]                                                                       
 [2] fetch(t::Task)                                                                                
   @ Base ./task.jl:337                                                                            
 [3] top-level scope                                                                               
   @ REPL[9]:1                                                                                     
                                                                                                   
    nested task error: MethodError: no method matching g()                                         
    The applicable method may be too new: running in world age 31218, while current world is 31219.
    Closest candidates are:                                                                        
      g() at REPL[8]:1 (method too new to be called from this world context.)                      
    Stacktrace:                                                                                    
     [1] f()                                                                                       
       @ Main ./REPL[4]:1                                                                          
     [2] (::var"#1#2")()                                                                           
       @ Main ./threadingconstructs.jl:178                                                         

Without defining g(), you get a regular MethodError from the spawned task, since there isn’t even one in the future. “World age” is one of the reasons why julia can be dynamic with eval and still be compiled instead of interpreted.

2 Likes