Dependency policy - should we avoid dependencies or embrace them?

Julia’s package ecosystem is really modular, with a big emphasis on functionality in packages. Still, it is very often implicitly assumed in discussions here that dependencies are a bad thing. I understand why: the more dependencies you have, the more risk that one of them will break, especially with new julia versions. And some packages (cough cough) are very dependency-heavy .

But the Julia community seems sometime to take this to the extreme, e.g. people are hesitant to use NaNMath simply because that gives a dep, but depending on NaNMath is IMHO exactly the correct and ideomatic thing to do when you don’t want the new 0.6 NaN behaviour.

Generally, I would think the best thing for the ecosystem would be to have it quite coherent with many packages sharing the same dependencies. Is there any recommendation or authoritative views on this issue?

5 Likes

For me this is very package-specific. If I see that a package is well-written, has good test coverage, not too many outstanding issues, and recent activity, then I am reasonably confident that its maintainer cares enough to fix issues quickly. Sometimes this assumption is tested when I start using the package and open issues. This approach worked out well for the v0.5 => v0.6 transition.

Specifically, if I were to evaluate NaNMath.jl by my own criteria, I would notice that while the build passes, coverage information is not readily available (it seems to be disabled), so the first thing I would do is submit a PR to fix that and see how the author responds.

I like small packages, which ideally do one thing which composes well with other functionality. I won’t mention my favorites here, but I generally start a new project with 4-8 packages in REQUIRE. I think that with a proper CI setup, I get an early warning if things break. I have to admit that I use open-ended version numbers too easily, I should be more cautious about that, fix it and test before incrementing.

Whats your favourites?
I start with
StaticArrays
Rotations
Parameters
BenchmarkTools
DataStructurs
IterTools
PyPlot: plot,imshow

before anything else

I think that small Julia-only dependencies which unchanging APIs are something to not worry about. I use StaticArrays.jl, RecipesBase.jl, Parameters.jl, DataStructures.jl, IterTools.jl, RecursiveArrayTools.jl, Compat.jl, Requires.jl, Reexport.jl, etc. pretty liberally as though they are basically part of Julia Base. But the moment a binary dependency is added there’s something tricky involved (which inevitably fails in some situations), and so that needs a lot more vetting or isolation (i.e. ability to install without). Some can be okay though, like Distributions.jl which has RMath, or FFTW.jl. These can be okay since most users probably have them already installed whether they know it or not.

I may be an outlier but I don’t really care about reducing dependencies. I use any package I need, and if necessary I fork it. But then I don’t have many packages outside of my computer, so I don’t represent the active package developers and maintainers who I suppose you are mainly addressing in your post. Just my 2 cents though.

In addition to the fine libraries mentioned above, I also use

  • ArgCheck.jl, which makes informative assertions much nicer to write.
  • StatsBase.jl, for anything involving statistics.

LambertW.jl (which will be moved to SpecialFunctions.jl) uses

const LAMBERTW_USE_NAN = false

macro baddomain()
    if LAMBERTW_USE_NAN
        return :(return(NaN))
    else
        return :(throw(DomainError()))
    end
end

Providing some kind of hook that NaNMath can use might be useful. Maybe generate two versions and prepend characters to identifiers in one of them, which are not exported.

Just to be clear, I wasn’t asking for recommendations for dependencies to take. The title (“dependency recommendations”) was probably ambiguous, so I changed it.
What I wanted to do was to shake a little what I perceive as an implicit culture in Julia that dependencies should be avoided - I personally don’t think so, I think we should depend as much as possible, in order to avoid duplicating efforts.

FWIW, I have no such perception. Some core developers, eg Tim Holy, have tiny packages that Do One Thing™, which by construction implies that anyone using functionality from them will have a host of tiny dependencies. I find this design approach much more appealing than large, monolithic packages.

There is a practical concern: Pkg2 seems to have a per package fixed cost for operations. But Pkg3 has overcome this and is very fast, it can pull/update a large number of dependencies in a blink of an eye.

1 Like

I definitely agree here. If you’re internally writing multigriding or other things like that in order to achieve your goals, think about making that a well-documented separate package so others can use this functionality, or try one of the available options. Every package having their own internal GPU array type is a good way for none of them to be great.

4 Likes

I get/have the mentality about dependencies. One of my package indirectly has 53 dependencies, once everything is resolved. Once it’s installed, maybe not a big deal, and especially if other packages already have installed things. But still, cloning 53 packages to make a plot just seems crazy.

Well, I agree with that, because plotting is so fundamental behaviour, and shouldn’t require 53 packages. I think that’s a special case though.

That problem is e.g. seen with Turing, which is otherwise well-designed, but depends on Mamba, which again has chosen to depend on Gadfly - which ends up being responsible for like 90% of the volume of installing Turing!

IMO that’s bad design. Plotting functionality should be factored into another package, or made optional with Requires.jl.

2 Likes

Agreed

This is just about having a good developer target. If a package is sufficiently large that people are trying to depend on a small portion of it, it should be organized to have a small developer target. No one should depend on a plotting package, just RecipesBase.jl. No one should depend on DifferentialEquations.jl, just the parts you use (DiffEqBase.jl, or things like OrdinaryDiffEq.jl). No one should depend on all of JuliaOpt, just JuMP.jl and whatever solvers you use. Etc.

1 Like

I’ll give my perspective on this as a developer who went through this exercise about a year ago. In my case, I had two objectives: first, to make it easier for people to install LightGraphs without having to download 40+ other packages (it’s a graph library, for pete’s sake), and second, to ease some of the pain surrounding broken packages during Julia version transition. 0.5 was especially painful because our downstream dependencies were being fixed on different schedules. This isn’t to blame any other developer; we all have our own priorities, but it resulted in LightGraphs master being broken for longer than I would have liked.

The result was a formal policy (that will need to be updated now that stdlib has been refactored); if you’re interested, you can read it here (4th bullet).

In summary, I think lightening our dependencies within LightGraphs was the right move, and it provides “pick only what you need” capability: if you need a basic graph library, LightGraphs is it; if you need flow calculations that depend on JuMP, then you can add that package as well.

I think the answer is both yes - and no.

…but what you really should be keeping an eye out for are interdependencies in your software (not necessarily talking about package dependencies here). If you become diligent about reducing interdependencies, I believe the question about package dependencies will simply boil down to “is this package of good quality - or should I build my own”.

Why do we want dependencies?

In my experience, code becomes much cleaner/easier to maintain with a careful construction of the software layers.

When you start writing code, it is really easy to build a solution where multiple components are interdependent. This situation causes alot of headaches when the complexity of a program increases. Developers have trouble deriving a good mental model of the solution - and then the implementation starts to degrades.

Layering is a good way to mitigate this problem. If you need a particular feature - like NaN math - then you should push it to its own “subsystem”. Ideally, that subsystem should only depend on what it absolutely needs in order to work. If you do this well, I believe you naturally start developing packages similar to those of Tim Holy, that “Do One Thing” (as @Tamas_Papp stated).

→ Whether or not you use a 3rd party package, build your own, or integrate that “subsystem” directly in the software you are deploying, the important part is to layer things properly. By “properly”, I guess I mean that the solution just feels natural and appropriate (not forced) given the problem at hand.

This is not as trivial as you might think. Most people will probably have to put alot of thought into their software in order to reduce this interdependency. It is actually quite annoying, because the solution just seems to make alot of sense when you build it well… but I can honestly say that my own solutions are far from harmonious when I start writing them.

Why do we want to avoid dependencies?

In order to remove interdependencies, we must try to eliminate cyclical dependencies.

A good example of this resides in most modern GUI toolkits I have used myself. Modern toolkits use some sort of callback mechanism that show up with different names:

  • Callback functions
  • Signal/slots
  • Event handlers (MS products)

These “event handlers” (I sort of like this name) are a great way to decouple widget code from that of the user application. A widget does not have to know anything about the target application in order to be useful.

How to reduce interdependencies in Plots.jl?

Well, one example that comes to mind is to build a clean interface to plots that does not depend on any of the backends directly. Let’s call this module PlotsBase.

The next layer up would simply be the implementation of the PlotsBase interface. Moreover, to avoid being stuck with a master module that knows about all plotting backends, one would have to create seperate modules for the different implementations:

(PlotsBase, GR) <- GR_Plots
(PlotsBase, PyPlot) <- PyPlot_Plots
(PlotsBase, InspectDR) <- InspectDR_Plots
...

But what about the users that what a plug-and-play experience - and don’t want to pick their plotting backend with a using statement, but would rather call backend(:gr)?

→ They can instead call the higher-level wrapper module called Plots:

(PlotsBase, GR_Plots, PyPlot_Plots, InspectDR_Plots, ...) <- Plots

What’s so good about this new Plots hierarchy?

If a user/company now wants to build their own private plotting backend compatible with the PlotsBase module, they can do so without having to maintain a modified version of the Plots.jl package. They can just use PlotsBase as-is, because PlotsBase does not require implementation details of the plotting backends.

2 Likes

This is what’s being done with Makie.jl

I totally agree with this. Most of what I want to say has already been said in this thread. Dependency is not any different than the initial using Base, as long as the quality of the included packages follows the above mentioned principles. While my most included packages may reflect my specific field, it’s the packages @Tamas_Papp mentioned (and e.g. @tim.holy writes) that lead to really cool innovations.

One thing that occurred to me: it may be important to differentiate dependencies for developers vs dependencies for users. It sounds as if the majority of folks here are talking about reducing dependencies within their own packages by splitting code up into separate packages based on their requirements. This leads to a proliferation of smaller packages, which means users of those packages will have (lighter-weight) dependencies.