Loading only what you need

I was looking at SpecialFunctions.jl which contains a bunch of functions which are fairly independent and thinking: if I just want to only use one or two specific functions, what’s the overhead vs just copy-pasting that code in mine.

For instance:

julia> @time import SpecialFunctions: sinint, cosint
  0.315543 seconds (399.21 k allocations: 22.219 MiB, 1.29% gc time)

vs (in a fresh session)

julia> @time include("sincosint.jl")
0.087249 seconds (260.80 k allocations: 14.275 MiB)

(which is not the best way to do it but you get the idea).

Of course the former is slower because it doesn’t just load sinint and cosint even though only those two are brought in and no other function from SF are available at this point; if you import other functions subsequently, it’s instantaneous

julia> @time import SpecialFunctions: ellipk
  0.001191 seconds (433 allocations: 22.875 KiB)

but… say I don’t care about ellipk.

What if I only really need a handful of functions from a package?

I imagine other people have thought about this before so I was hoping people could point me in the right direction. While I understand that in general it would be difficult and annoying to track down all the code that would need to be loaded and so that the current situation makes sense, I was wondering whether, in some cases, it might be possible to offer a way to do it.

Maybe more specifically: whether it would be imaginable to signal from within a package chunks of the code that can be loaded independently of the whole. (ideally independent functions or maybe small groups of functions if they share a util or a const). That might only make sense for a few packages like SpecialFunctions but it could still be useful.

The reasoning stems from a discussion here a week or two ago about the sigmoid and logsigmoid functions which I believe at least half a dozen packages implement in their own way. It seems to me that it would make sense to have banks of optimised functions (like SpecialFunctions.jl) to re-use but without it incurring a significant load time overhead (a reason why, I think, many people try to slim down their dependencies as much as possible) and just load exactly as much code as required. Of course some cases might still warrant customised implementations.

Summary

  • I imagine people have thought about this, are there pointers to past discussion I could look at? I had a brief poke around but didn’t find anything
  • How hard would it be to signal from within a package chunks of code that could be loaded independently and make it possible to only just load such chunks
  • Is it worthwhile to try to investigate this or are there good reasons not to try (e.g. people have tried and it didn’t work etc)

Thanks!

10 Likes

Could you do this with nested Modules? Isn’t that sort of what HTTP and Genie do?

I definitely like the idea.

No, its just a way to structure code, you always load the full package.

4 Likes

The function you want might call some other functions in the package, and which functions might not be possible to infer until runtime.

yes definitely, or types, const, …, so there are many cases where this would not make sense because you would need to load most of the package (so might as well load everything). The kind of stuff where it would make sense are specifically packages where functionalities could be well separated. Of course we might wish that each such standalone functionality would be its own package but I think this level of fragmentation is not desirable.

The kind of places where this would make sense are the likes of (or parts of) SpecialFunctions, StatsBase, Combinatorics, HyperGeometricFunctions, … etc; if there’s a way for package dev to signal chunks of code without it being in the way of their devevelopment, it might help avoid duplication across code bases and ensure people use robust and optimised key functions

I wonder if you could do this via a third-party package with sort of “half” of the approach that PackageCompiler takes. Trace the program, steal the appropriate code from the packages that you use, but rather than compile it into a system image, just put it all into a new .jl file that can be included?

Totally not an elegant solution and likely to royally screw some things up but maybe a step in the right direction

Sort of like a “minify” for Julia programs.

I think this is going to run up against pre-compilation. In theory pre-compilation of a module converts it into a format that can be loaded fast. Maybe pre-compilation could create a different format where there is an “index” that is loaded first and from that it determines what functions to load by what you want to import. However unless you are talking a module that is megabytes (pre-compiled) in size, issuing a single read to read the whole module is going to be faster than a read followed by multiple small reads to load the methods you are interested in. (This assumes that the CPU time to convert from the saved pre-compiled format to a runnable format is negligible.)

I suspect the delay is actually the check to see if the module needs to be re-compiled. i.e. has any of it’s dependencies changed, has any of their dependencies changed and so on down the tree.

Of course this is all a guess on my part, so I could be totally wrong…

1 Like