Binary dependencies


#1

Following up on this discussion about CxxWrap.jl I would like to discuss some of the choices that can be made regarding binary dependencies:

  1. Build from source or download binaries?
  2. Central or per-package installation prefix?
  3. Role of distribution package managers?
  4. Role of external package managers such as Conda or Homebrew?

Regarding 1, ideally the user should be able to set a global default option on what he prefers. The default could also depend on the Julia version in use, e.g. use BinaryBuilder-provided binaries if the official Julia binary is running, or prefer building from source if a built-from-source Julia is running. The setting could be as simple as an environment variable, or something more involved.

For 2, if there are inter-dependent binary packages, having a per-package installation prefix can be a pain, so having the option to use a central prefix shared by all binary packages would be nice, from my perspective. I’m not sure if that gives rise to unresolvable conflicts, however.

For 3 and 4, I think that if Julia is available from a package manager, binary dependencies should be provided using the same system if possible.

One concrete use case I have on our CentOS cluster is that we compile a whole series of high-performance libraries with a built-from-source GCC, and it would be ideal if those binaries could be reused from within a built-from-source Julia, without needing separate config of each package.

CC @StefanKarpinski @staticfloat @tkelman @SylvainCorlay


#2

We have some general plans around giving each external libraries that need to be loaded a UUID. Then, all that somebody else building binaries would have to do is provide a mapping from that UUID to their equivalent library (e.g. HPC centers can integrate this with their module system, distributions can provide a way to map it to their packages). What exactly the declaration of this mapping would look like is a design discussion to be had (probably after we’re done designing and implementing the UUID bit). However, our first priority will be to get everything working with BinaryBuilder, such that if you’re using the julialang.org distribution of julia, everything will just work for every package. I do note, that producing this mapping will be a lot of work for distribution maintainers, but hopefully we can provide some tools to make it easier (e.g. generating an overview of which packages work, which don’t, which binary dependency would provide the most bang for the buck - as well as automated testing).

There’s also a question about how to mix-and-match libraries that we want to provide and those from a distribution. I think the best thing to do here would be to give BinaryBuilder the ability to create a “virtual shard” with that distribution’s compilers and runtime support libraries. Hopefully, with the above mentioned tooling that should be able to fill out any missing binaries mostly automatically.


#3

OK, sounds good, having per-library control like that is certainly fine-grained enough to make anything possible, and I suppose higher-level tooling to manage things could even be added through packages.


#4

I must say that the proposal of Keno for having external dependencies given a UUIDs worries me. It looks like a system that would be completely ad hoc to Julia… Package managers would need to create tools to map it to their packages etc…

On the other side, if the behavior of Julia is that of any standard C++ package, meaning that it is installed in a given installation prefix, then the default way of dealing with external dependencies would be to look for it in that same prefix… That would make packaging with Debian / Conda / RPM / Homebrew that of any package.


#5

I am very interested in this topic. I would appreciate if @keno would go into more depth on the UUID stuff.

However, as an HPC person, alarm bells are going off. Permit me to exaggerate for effect.
firstly @barche mentions Conda and Homebrew All good, but I feel that the current model for a lot of software development, whether in Julia or Python or… is someone sitting at their own Macbook. Then when they have something running they expect it to run on an HPC cluster which may not duplicate their personal environment.

The philisophy on BinaryBuilder.jl says:
No more struggling with system package managers. No more needing sudo access to install that little mathematical optimization library.
OK, I get that. HOWEVER please consider an HPC cluster setup. You are likely to have a very small home directory, mounted as an NFS share. You are not intended to compile and use software from that disk. There is likely to be some huge, fast, central storage.
SO what’s the problem with an NFS home directory? When you start code up simultaneously on many, many compute nodes the libraries get pulled in… from the same NFS server and things get bogged down horribly. That is why those expensive parallel filesystems are there.

I would hope that with Julia packaging we can create a system where yes of course developer X can develop on a Macbook. Then find that with some combination of modules/environment variables the friendly admins at the HPC centre can provide the required binary dependencies so the code can run seamlessly on the cluster.

Sorry of this is an incoherent rant.


#6

I’m not entirely sure what you’re suggesting we do instead here. Package managers are of course free to create their own julia-* package for every binary dependency and simply dump a binary copy of our binaries in there. However, that would duplicate almost every library and at least the linux distributions are unlikely to do that. Thus, to make a julia package work with a system package manager (and the libraries provided by it) somebody has to go through and

  1. Test that Julia package against the stack of system libraries that the package manager provides.
  2. Change that system’s build script to apply any patches or configuration changes that the julia package assumes and work with the upstream maintainer to get those into the standard distribution.
  3. Record the UUID=>system package mapping in a registry somewhere.

I hope you’ll agree that 3 is by far the easiest of these three steps (which ideally need to be performed on a continual basis, with CI systems set up to validate that it keeps working). We’ve had real problems with distributions just trying to use whatever setup/configuration the dependencies just happened to have and calling it a day. That leads to a crashes and bugs and an horrible user experience, because nothing works as expected. We’re happy to work with distributions to get julia working well, but for each distribution somebody is gonna have to do a significant amount of work, to make sure it works. Not because the mapping needs to be produced (that’s the easy part), but because somebody has to go through and make sure everything actually works. As I said, we’ll try to provide tools to make this easier.


#7

Yes, I’m aware of the constraints of an HPC environment. The UUID idea is basically the following. Rather than saying something like:
dlopen("libfoo.so") (or the equivalent in ccall), you’d say something like
dlopen(Library("0000-0000-...", "libfoo.so")) (or dlopen(Library("LibFoo", "libfoo.so")) where the LibFoo -> UUID mapping uses the standard package manager name resolution mechanism) and the system would figure out how to get that library for you an open it. Basically, when an HPC center decides to support julia, they’d provide this registry of UUIDs-> instructions how to get it (i.e. they’d basically do the same thing as a generic linux distribution). So when I say that I want the UUID corresponding to LibFoo, that registry might tell me that I would have to load the libfoo module (in whatever version) and julia would go ahead and activate that (or just find the place on the file system where it’s stored).


#8

The way I see this is that most package managers will in fact adopt the prefix installation approach, that is a base directory containing the standard, include, lib, bin, share, etc, var, where each package puts its own assets, Julia included.

On a unix system, that would typically be a usr directory. In the case of conda, the prefix is the root environment directory. In the case of a conda environment, the prefix is the environment directory.

Now, what I am suggesting is that the official Julia download would get you the equivalent of the content of an installation prefix, including Julia and its dependencies. Now if you have some Julia packages wrapping a large native library (e.g. Qt), Qt would simply need to be installed in that prefix, instead of be vendored by the Julia package.

This would make the life of package managers much much easier, and you could still have a special mode where PkG would still be responsible for installing these things when the user got Julia from the binaries that you distribute instead of some package manager.

In this context, a package like libcxxwrap would be a peer of Julia in that installation prefix.


#9

What would be really nice to avoid is the sort of situation where the R community arrived, with packages like RcppArmadillo, and RcppEigen vendoring Armadillo and Eigen even in the case of the Debian packages (infringing the Debian policy), or the sort of mixed mode for Python where pyzmq vendors a libzmq library in the case of pypi and relies on some libzmq build otherwise.

This whole vendoring fiasco makes distributing packages depending on multiple languages really hard because each language decided to have their own idiosyncratic packaging system instead of going the unix way.


#10

The R model doesn’t sound very appealing to me. Distributions provide a few packages in their official repos, but they are quickly outdated, and most packages are only available from CRAN. In the end that’s quite messy. Since we cannot expect distributions to include all Julia packages and keep them up-to-date, I’d rather not include any Julia packages in official distribution repos.

OTOH I agree it can make sense for Julia packages to use system libraries when possible, in particular to ease the interaction with other libraries and applications. But that can only be done on a case by case basis after distribution developers have reviewed these libraries and checked that everything works. And if they are outdated Julia packages have to download an up-to-date version.


#11

Well, that’s the crux of the matter though. Who’s responsible for putting a correctly built version of Qt there, and who do you get it from?


#12

I think for the official Julia binaries, BinaryBuilder is great here. The question is if the prefix should be inside the package directory, or if it should be a single prefix per environment for example? The advantage of a single prefix is that it simplifies configuration of binary packages that depend on each other, and on Windows this dir could be added to the path so the dlls of dependencies do not have to be loaded manually by the package. The downside is that there might be conflicts…


#13

The rough plan is to not have a prefix at all (and have the julia dynamic library loader resolve the dependencies appropriately) or have a single prefix per environment that installs the correct versions of things, in order to make sure that a MANIFEST is always reproducible.


#14

The package manager. Basically, I am advocating against language-specific package managers and to only rely on language agnostic ones, that can package Qt, Boost as well as Julia. You may implement that in Julia, or use e.g. conda, and ship a self contained distribution.


#15

If a built (Julia or anything) package can be a number of assets placed under directories of an installation prefix, Julia does not need “its own”, it may be used to implement one, but any other would do the same job as well.


#16

That’s a fair position, but we have reproducibility and usability goals in the julia package manager that are not met by other systems. In particular, having a system that only allows one set of versions, by distribution versioning is insufficient for MANIFEST reproducability. That said though, the jula package manager is not really julia-specific, and you’re welcome to use it as your language agnostic package manager.


#17

It is probably still Julia specific in that it does not follow the prefix installation pre-supposed by most others. Also, there are lots of assumptions baked into Pkg (packages are git repositories, etc…)

Having language-specific package managers makes things more siloed, and also hurt sustainability because it puts the burden of maintaining builds on the language communities alone.


#18

I wanted to point at the conda-forge initiative, which has nearly 5000 packages community-maintained by hundreds of package maintainers, and not only python, but R, ruby, etc…

If there was a pure Julia client to conda channels, that would probably be a much more scalable thing to adopt, and much closer to the model of linux distributions.


#19

What about setting the Julia package dir? I work on different HPC environments with Julia and I am also administrator on one of them. I create a setenv-script which sets the appropriate paths and the user can start off with the group-installation. If anyone wants to manage their own packages, they can simply set their $JULIA_PKGDIR environment variable and they have a “fresh install”.

This is btw. much cleaner than a central Python installation (pyenv, conda, virtualenvs, etc. we tried different approaches, but users have to take care of everything if they want their own packages managed)…


#20

So you’re advocating that we adopt conda-forge, and don’t support any other package manager (e.g. the HPC center use case suggested above)? If you’re not advocating that we only adopt conda-forge, then we’re back to the exact same situation, where we somehow need to identify what each library is called in each package manager and where to find it. There’s a separate discussion on conda-forge vs BinaryBuilder, which we can certainly have, but it’s some tangential to this discussion.