Slow installation of mono-repo

xvries · July 31, 2024, 8:43am

I am maintaining a privately registered git repo which contains multiple packages. The repo is starting to become rather big ~300 MB (I know I should do something about that). Whenever a user installs a package the whole repo needs to be cloned. Even if the repo has already been cloned, the package manager will still make a new clone. This results in multiple package installs taking a long time to download since its N x 300MB which needs to be downloaded.

I am looking for some advice on how to manage this. The few things I came up with myself:

Reduce git repo size (difficult/dangerous, not a permanent solution)
Move packages to single git repos (this may not be desired for us)
Find a way to reuse the cloned git repo to install multiple packages?

Is there something else I can do to not make installation of our packages that heavy? I notice that installation from General registry packages is often very fast, so I assume that they use some smarter tricks?

GunnarFarneback · July 31, 2024, 10:40am

The main thing that makes installation of packages from General different is that packages are normally distributed by package servers instead of by cloning git repositories. There are ways to distribute your own packages through a package server too, e.g. GitHub - GunnarFarneback/LocalPackageServer.jl: Julia storage and package server for local packages.

That said, there are probably optimization opportunities for this use case both for the package manager itself and for LocalPackageServer.

maxfreu · July 31, 2024, 1:48pm

Are you storing data in the git repo or why is it so large? If so, you should definitely move the data out, as git is best for code only. For comparison: the pytorch repo is 1.2GB in size with complete history. If you want faster clones you can restrict the depth like so: git clone --depth 1 , which only clones the latest version. This for example reduces the size to 331MB for pytorch.

xvries · July 31, 2024, 8:51pm

Unfortunately, we do indeed have some test data in our repository. I think over time that has caused the bloating. When cloning manually the –depth 1 trick will work, however, when the package manager takes over I think it does a full clone and checkout of the git hash associated with the version of the package to install.

One full clone is maybe not even so bad, but when multiple packages in the repo need to be installed it needs to clone the full repo every time. Maybe there could be a way for the package manager to recognize it has a full clone already in the depot clones folder?

GunnarFarneback · July 31, 2024, 9:15pm

Support for having a package in a subdirectory of a repository was added much later than the mechanisms used for the clones folder so it’s likely that there are things that can be improved, but with package servers now being the main mode of distribution of packages, it’s also likely to be fairly low in priority for the Pkg developers.

giordano · August 1, 2024, 12:45am

Why not using lazy artifacts instead?

Topic		Replies	Views
Lots of files to install a package New to Julia	5	515	June 1, 2018
Package getting large - can it be made smaller? General Usage package	3	628	September 13, 2017
Julia packages cloning after every commit? Package Management	1	153	June 21, 2024
Julia 0.7 package development workflow - where are Git repositories of packages? General Usage	9	2135	July 19, 2018
Simplest way to distribute package on shared server? Package Management git	4	120	June 6, 2025

Slow installation of mono-repo

Related topics