Using the type and functions for dispatch purposes without loading all codes of its owner package

Like C/C++'s header file, I’m wondering if it’s possible for Julia to introduce similar concepts so that package authors can have a smaller dependencies and thus reduce the latency.

In valid Julia semantics, this can be done via separating the package into two packages: 1) an “lightest” package with pure definitions, and 2) the implementations. For instance:

# FooInterface.jl
module FooInterface
struct MyType end
function f end
end
# Foo.jl
module Foo
using FooInterface: f, MyType
    
f(::MyType) = ...
map(::typeof(f), x::MyType) = ...
end

For package Bar.jl that depends on Foo.jl, the author can choose to instead depends on FooInterface.jl if Bar.jl uses Foo.jl only for dispatching purposes. Because to implement the functionality of f, Foo usually introduces extra dependencies, and these dependencies are not necessary to make Bar.jl work.

By doing this, the package Bar now depends on a lighter dependencies, thus 1) using Bar can be faster, and 2) maintaining Bar.jl’s compatibility is easier. Will this improve the overall TTFP status? I’m not sure.

The disadvantage is that if the package author wants to follow this strategy, he has to maintain double packages, double CI settings, and to open double PRs, trigger registrator twice… I believe this is why this strategy is not adopted massively in Julia.


I open this discussion to ask if it’s possible for Julia to support this “separating out interface package” strategy at language level so that package author can benefit from the lighter dependency without the extra maintenance burden.

Just for inspiration, the syntax can be something like this:

module Foo

@interface
struct MyType end

@interface
function f end
module Bar
@interface using Foo: MyType

g(::MyType) = ...
h(::typeof(f)) = ...
end
3 Likes

It don’t think this makes much sense for Julia.

Because of that there is a compiler and a linker step in these languages where in Julia we only have the compiler step. Introducing something like you propose would make a linking step needed so nothing is won for the TTFX problem.
And as soon as you are calling into the code you need to import it and compile it, so actually the compilation is just postponed.

Perhaps what you really want is, that we could only be dependent on a small part of another package. Only of those parts you are using in your own package. This is more about of better design of the package you are dependent of.

Anyways, a syntax for interface/API definition could be something beneficial, e.g. see this discussion: Base.modules_warned_for silently removed in 1.8

1 Like

I should perhaps mention that this requested feature can be covered by Proposal for allowing packages to opt-into `import A.B` only loading `B` without loading `A`. · Issue #2005 · JuliaLang/Pkg.jl · GitHub, if it is done.

1 Like

I think having such “header” packages in subdirectories of the same repo can alleviate some of this. At least, you would only need one PR to update both packages. You would indeed need a more complicated CI setup and to trigger registrator twice, but I think that’s something where the tooling can improve. See Arrow/ArrowTypes and SnoopCompile/SnoopCompileCore for two examples of similar things that both use subdirectories for the smaller package.

1 Like

I’m sorry if I used the wrong terminology “C/C++ include files” and made thing unclear. But I disagree on the “the compilation is just postponed” argument, because not all compilations of the dependencies are necessary to build your needed functionality especially when you only want to dispatch on the types without calling the functions. “The compilation is reduced and amortized” might be a better word here.

Yes, we (me and timholy) has considerred to embrace the monorepo architecture to smooth our JuliaImages development workflow. I’ve started to write and play with a tool for such purpose. I know this will work if we split JuliaImages into more packages with the help of the toolings.

Another thing that can be improved by this strategy is the glue packages and “which package is the lighter” debate. See https://github.com/JuliaLang/Pkg.jl/issues/1285#issuecomment-894797529 for more details (oh you’re the last one to comment there). As a package author, I’ll be very happy to add pure-interface dependencies with associated glue codes, but I’ll be conservative to accept PRs that doubles the loading time only for that. Sometimes, people arguing about it only to offloads the latency from one ecosystem to the other ecosystem, but I don’t think that’s a real solution Depend on ArrayInterface by Tokazama · Pull Request #268 · JuliaArrays/OffsetArrays.jl · GitHub

Maybe with the good toolings, people will be happier to maintain a pure-definition packages, I’m not sure. This is exactly why I open this discussion to see if there’s a better solution at Pkg or Julia level to make things more intuitive. The Pkg issues have been opened for quite a long time but very little progress is made.

2 Likes

Yes, I jumped a bit on this.

I am actually not the expert here, but my understanding was, that Julia code is compiled at the time it is called, e.g. if you call a function with parameters of a certain type, the function is compiled for this special call. This, of course very very general said.
Now, the problem with that, was/is the TTFX issue. This was later addressed with heavy use of precompile when you add a package and/or when you are using it.
Now, you a talking about the problem, that all code is compiled whenever you using/import a package/module/code for example.
This part of your topic seems to me a bit circular in respect to the history of compilation of Julia.
But of course very general.

I agree, it can be helpful, e.g. when you just need some type definitions, especially abstract types, if you could just import those. I am not sure, how much this special usage is around and how much it would improve dependencies if this is supported with special syntax. For me it sounds like much hazzle to improve a not very common problem, but here I can be very wrong, just having my own experience so.

It would be helpful and interesting to analyse some packages with heavy dependencies how much they would improve (e.g. in compile time) with that feature you are missing.

From a general language view point, I totally agree: separating definitions and implementation is something good. For types this is already possible, but it is not part of style or programming guide yet. For function interfaces it is not possible for now but should be not so difficult to add a interface syntax like function/method stubs without the implementations.

I don’t see any issues for repos or as a package maintainer. It’s all about quality of code and standards and having separated definitions and implementations typically increase general quality which is a benefit for the own packages and dependencies. So it should pay off for everybody. But I just have no idea how large this pay off really is in respect to the work which has to be done to make this possible in Julia.

Well, I am happy with the things as they are now (except for 2 major things), I don’t feel the pain with these gaps in the language, but that’s just me. I typically take what is and try to use it.

I think importing without loading will help a lot of packages improve their load time, which provide some functionality that loads in no time but wants integration with some heavy package ecosystem.

I have contributed to UAParsers.jl, which basically searches the user agent string in a list of regular expressions to determine what user agent it really is, and with DataFrame integration (let’s assume it is important to some users). Without DataFrames.jl as dependency, this simple packages has no significant load time, but DataFrames.jl really slows it down.

Then their is a dilemma:

  • If UAParsers add DataFrames as dependency, the users who do not need DataFrames integration suffer longer load time.
  • If DataFrames creates a light interface package, everyone is happy except DataFrames’ developers will have to maintain two packages as the OP mentioned.
  • If another packages, UAParsersWithDataFrames, is created to provide the integration, UAParsers developers will have to maintain two packages.
  • If we allow some kind of interface import, then everyone is happy. The users who do not need DataFrames integration do not suffer longer loading time, and users who do need DataFrames integration has already loaded DataFrames, so do not suffer longer loading time either.

The granularity of interface import may vary. If we can import types and functions without loading the package, then this is effectively forward declaration that some have long craved for and may not be realized in the near future. Another possibility is to only load one module of the target package, which the PRs Jonny Chen referenced are addressing.

There are quite a few packages that have DataFrames.jl as a dependency, that would otherwise be pretty lightweight, and that don’t need DataFrames at all.

UAParsers.jl is exactly this: it can, for example, provide a conversion to a base Julia Table (vector-of-namedtuples or namedtuple-of-vectors), or just don’t do anything - a Vector{DeviceResult} is a table by itself, and can already be converted to another table.
Further, all of these UAParser.jl/UAParser.jl at c66612c6bcea3176001854d924187d07cfc5e23a · JuliaWeb/UAParser.jl · GitHub are pretty serious cases of type piracy: they change the behavior of existing conversion methods.

Sorry this is a bad example. I chose UAParsers.jl because this is the only package that I contributed which imports interface from a much more complex package.