What is the cleanest/best way to implement dynamic loading of Julia code?

I am writing here to ask for advice on how to achieve loading of Julia code during runtime. This is not a question on how exactly any particular mechanism works but rather I ask for a more high-level advice before committing to a strategy that might turn out to be problematic later.

As a newcomer to Julia (mostly) from interpreted languages such as Python and perl, I ask myself “How should I go about loading entities containing code?” In the previously mentioned languages there are mechanisms that work quite cleanly such as importlib.import_module in Python and require in perl (if memory serves me). On the other hand, these languages are not focused on performance, so it may well be that there are issues that I as a user simply wouldn’t be aware of.

I picture a situation where a certain category of .jl files provide a set of methods (each). These would then later be loaded and called in a runtime determined order, perhaps not loading all of them, perhaps returning to some that had already been used.

Obviously the methods would constitute an interface to these code files (albeit not necessarily an explicitly formalised one). This is not impossible in Julia, but there seems to be a lot of subtle traps and considerations one has to think of. I’ll list a few assertions that I find hard to reconcile below and perhaps someone would like to comment on 1 whether they are correct and 2 how best to reconcile them (imagine that I’m loathe to sacrifice performance if at all avoidable):

  1. All or at least almost all code should be encapsulated in functions.
  2. You can load a .jl file with include and e.g. methods will be (re-)defined at runtime.
  3. Things that are defined at runtime cannot be called normally until you have left the outermost function.
  4. Calling things “not normally” using Base.invokelatest or being clever with eval et.c. come at a price (in my own experimentation usually that I never get it to work, but I’m a newb so that may be on me).

Basically my question boils down to: what is the best/preferred/canonical/cleanest way to deal with this?

As I understand it, the problem (manifesting itself as world age issues, other errors or unexpected behaviour) usually boils down to “the compiler cannot guarantee X unless not Y”, Y here being whatever I need to do to load my code. Is there a mechanism for supplying the necessary assertions to the compiler to get these things to work (just like I can assert that the indexing is in bounds)?

Sorry for rambling all over the place, but I don’t want to know a lot of specifics about e.g. world age counters if the consensus is that a completely different mechanism should be used.

It is generally preferred not to do this.
It is, as you have concluded, a rather more advanced topic and less trivial in Julia because julia is compiled.
It is also hard other compiled languages like C.

In julia the issues are things like the compiler will inline code from methods, but if you load new code that defines a new more specific method then then the existing code has to be recompiled to include that instead.

What is your overall goal?
Likely there is away to accomplish this without dynamic code loading.

1 Like

I really don’t even see the benefit of this. The cost of loading 99.9% of packages is small and there are very active ongoing efforts to shrink the time spent precompiling code (lower the costs).

After the packages are loaded, what is the cost of having additional unused code loaded?

2 Likes

I have often found myself using dynamic loading in different situations so it would be good to know what the best way to do it is, even though it might be a compromise between speed and flexibility, et.c.

In this specific instance the scenario is this:

  • I have files containing a big structure of numbers and a string (and some other objects but we won’t bother about them).They are saved in BSON format. They are many. Let’s say 1000.
  • I have files containing julia code that generate a flux model. They are a few, let’s say 20.
  • I want to:
    1. Load the bson file.
    2. Look up which julia file to load (filename in previously mentioned string)
    3. Load and evaluate that file (e.g. with import)
    4. Run a function with a predetermined name and signature (re-)defined in the file, getting a flux model in return
    5. Assign the weights (the big structure mentioned above) to said model
    6. Do some inference with the selfsame model
    7. Next bson file, goto 1

With the amount of data being pushed through the models we can assume that >99% of the time will be spent in step 6.

While it is — of course — interesting to hear that you see no benefit in this, that doesn’t really tell anyone which is the best way to do it (to which the exact answer might of course depend on several parameters).

If on the other hand someone were to say “it is impossible to do this (in a sufficiently good way, let’s say)”, that could probably be valuable information to someone asking this type of question.

1 Like

That’s for performance reasons. Nothing forbid you to get non-bottleneck code run at global scope

Ni, you should be sure that a file is INCLUDED only ONCE. For example, if you include twice some code where you define a structure you’ll get an error. Conversely, you can then import/using your included modules several times without problems.

Not sure.
This works:

function outer(a)
    inner(x) = x+1
    out = inner(a)
    return out
end
outer(2)

This too:

function outer2(a)
    include("inner.jl")
    out2 = InnerA.innerA(a)
    out3 = innerB(a)
    return out2+out3
end
outer2(10)

With inner.jl with:

module InnerA
export innerA
innerA(x) = x+2
end
innerB(x) = x+3
a = 3
println(a+4) # code ouside any function

Performances and security (if the evaluated code is not yours or has user inputs) the first coming to my mind…

I do not believe that anything is impossible with Julia.

Also, I was referring to dynamically importing packages which is the typical usecase in which people ask about dynamic loading. Your scenario outlined above seems to suggest something else entirely, but perhaps I am wrong.

However, I think my statements in the previous post probably still apply. Suppose you had 1 file that contained all 1,000+ possible functions you were to call, yet in scenario you were only expecting to call 1 of them. Is it better to (1) include the single file once or (2) split it into 1,000 small files and cook up some fragile dynamic import mechanism?

Given the trouble with dynamically loaded code, why not consider the following strucutre:

  • Have files with a big structure of numbers and a string identifying which model you want to use
  • put all of the code to generate the flux models it in a local julia package, with an exported function that chooses which model generation code to run depending on the string identifier that it’s passed. This is doable since you have only a small number of distinct model types (20 or so)

Your workflow would then be:

  1. Load the big meta package
  2. Load the bson file
  3. call the exported function from your package with the string identifier loaded from the BSON to generate your flux model
  4. Assign the weights (the big structure mentioned above) to said model
  5. Do some inference with the selfsame model
  6. Next bson file, goto 2

I know this isn’t exactly an answer to your question “How do I do X” but it might be that such a workflow would be better suited to the problem that you’re actually trying to solve.

Thank you for refining my understanding of how this works! This is exactly one of the things I asked for (“there is one thing I asked for and this is exactly an answer to it”, not “the number of things I asked for that this answers is exactly one (as opposed to e.g. 1.21)”). :woman_teacher:t6:

OK, this might actually be the way I will take. It is a bit bothersome in that it can’t be easily called (or could it? perhaps there is some mechanism that I don’t know of) but I guess it could be worked around.

Imagine one were to put the “main” code in a function, keep a global variable representing state and run main and the code that needs to be global in a loop. In stead of calling the “dynamic” code in main one could just return after setting the global state (perhaps the return value). The main function then uses the state information to “remember what it was doing”. Are there any pitfalls with this that I haven’t thought of? One thing that I have thought of is that it will be a bit messy, but I think it could be managed.

Are you sure? If include only means insert this code here, it should depend on what is in that file. Isn’t it allowed to redefine a method? The files I use only have one method definition in it. Of course it defines a lot of other stuff but it’s all local to the above mentioned method.

You are right that one doesn’t have to wait functions defined at runtime in some circumstances. I had overlooked this. Perhaps this can be the solution. I guess that all I have to do is to collocate the loading and the calling of the function (same scope).

Which is the preferable solution “inside out main function”, “make sure to call the function in the scope it was loaded” some other mechanism (given the situation I have outlined)?

1 Like

No, of course not. This is what we (almost) always do. I import a module because if I need one function. I don’t even think about the “waste” of having the “rest” of the module compiled/in memory et.c.

I think that we just have talked past each other.

2 Likes

You could be right. I’ll keep this version in mind if the other thing turns out to be unworkable. I could still keep my models in individual files and just include all of them I suppose. I would have to put them in different namespaces but I think that include let’s you do that.

You don’t have to put your same-named functions in different namespaces ( or files) if you define them to be individual methods of a single function. They can be distinguished by an additional argument that encodes the desired algorithm in its type. Then Julia’s type dispatch mechanism will select the appropriate method for you.

I wonder if “Requires.jl” might be of interest. It dynamically loads code but for a slightly different use-case.