System wide Pkg installation

package

#1

Is there a way to install packages system wide alongside those added by users without duplication?

I’d really like to be able to install (and precompile) a selection of commonly used packages in a system wide directory, then have users add only the packages they need without duplication. Whatever I’ve tried so far, Pkg.add (run by the user), resolves to a large tree of dependencies, most or all of which are already installed (system wide).

When installing the system packages, I’m setting JULIA_PKGDIR to a system path, then running the Pkg commands. When a user logs in I’m not setting JULIA_PKGDIR so Pkg.dir() resolves to e.g. /home/user/.julia/v0.5" and Pkg.installed() is empty. Setting LOAD_CACHE_PATH allows "using " to work, but running Pkg.add will create a new METADATA file in Pkg.dir() and then start rebuilding everything. I would really like julia to build only the things which aren’t installed in the system wide directory.


Offline package installation
#2

I have thought about the same problem for use on a compute cluster, and my solution (which I haven’t tested yet) is to do the system package installation as you describe and then require all users to prepend LOAD_PATH in their .juliarc with all the directories in the system package directory. It might also be a good idea to make the system package directory read-only so users can’t update system packages.

Cheers,
Jared Crean


#3

Hi Jared,

I did exactly that (except I modified LOAD_PATH via a system wide juliarc (/etc/julia/juliarc.jl I think). It works well enough to let julia find the packages but I think this is just tricking the module system rather than the Pkg system.

As soon as a user runs a Pkg.add for something not in the system wide package cache, they will get a new ~/.julia/METADATA and the package system will start grabbing dependencies. This is regardless of whether some of those dependencies are already available in the LOAD_PATH.

In my specific case, the package the user is trying to add only REQUIRES PyPlot which I have system wide (along with all of it’s dependencies), this means that each user ends up duplicating the entire system wide package cache.

I’ve been wondering if git submodules would be a potential solution, but I don’t know exactly how that would work.


#4

I have just installed Julia on our HPC cluster.
I installed it into a shared location ( /cm/shared/apps/julia )
This area is NFS mounted on all the compute nodes.

On HPC clusters it is more natural to use Modules files - can anyone comment on the interaction between .juliarc files and Modules files? I guess I really ought to ‘suck it and see’ for myself first.


#5

@ianabc as far as I know, what you want isn’t currently possible. It is, however, something that has been recognized as a needed feature, and will probably be part of Pkg3, the next version of the package system: https://github.com/JuliaLang/Juleps/blob/master/Pkg3.md

In the meantime, you might be able to use Declarative packages: https://github.com/rened/DeclarativePackages.jl which maintains a central store of packages and then uses hardlinks to create project-specific Julia package dirs without duplication. It requires you to use a different tool instead of Pkg.add, but it might be worth it in your case.


#6

@rdeits Thank you, I’ll keep my eye on Pkg3 and help if I can. I don’t think hardlinks will work in my case because my system repository storage and user storage are on different filesystems.


#7

@johnh I think you’re talking about environment modules, in which case you might want to play with JULIA_HOME and JULIA_PKGDIR as environment variables e.g. set JULIA_PKGDIR=/cm/shared/apps/julia/share/site/v0.5.

You should be able to configure a module with enough information to get julia up and running on the cluster, and you should be able to do the same system wide package hack that I’m doing, unfortunately though, I think you’ll run into the same problem that I’m hitting. You can point julia at various locations for package caches, precomilation files etc. but at the end of the day, the Pkg tool suite only wants to deal with a single repository. If one of your users needs to Pkg.add something they will either need write access to JULIA_PKGDIR or they will need to rebuild everything in their HOME.


#8

@johnh I am using environment modules to manage loading Julia itself (and other binary dependencies), but I haven’t found any way of making it interact with the Julia package manager. What I would really like is if JULIA_PKGDIR became JULIA_PKGPATH, allowing multiple directories to be searched in a defined order. It would also be good if there was some environmental variable that corresponds to LOAD_PATH, but I don’t know of one.

@ianabc Could you have a login script symlink everything in the system package directory into the users package directory?


Finite Element Computations On A Cluster Using Petsc.jl
#9

@JaredCrean2 I’m not sure. Symlinks could span the filesystem boundaries, so that wouldn’t be a problem, but you would have to be very careful with package pinning to make sure that Pkg doesn’t try to update the packages which are symlinked. I’m not sure exactly how Pkg treats pinned packages with git; if it just checks a version number and doesn’t try to update the repository for that package then this might work, if it keeps your pinned version as a branch but still tries to update master then it is going to run into permissions problems.

If I can make time today I’ll give this a try and let you know.