Whenever I do something with the package manager, it tends to begin by updating the installed registries, including the General registry. This can happen several times a day, understandably, since the General registry includes a large number of packages, which are updated regularly.
This operation, however, takes a significant amount of time, and what is worse, it generates about 100 MB of network traffic. When I’m working over a mobile data network, this is means a significant toll on my data plan.
I began to wonder, is there a way to prevent such automatic updates? I don’t mind not getting the newest and latest from each and every package I install, as long as I stay in control when the registry is updated.
Maybe the time of last update should be logged in .julia/logs so you can have a timeout of one day or something on updating the registry. Sessions can be short lived so I’m not sure they are the right thing to track, and surely people that need frequent updates of the registry (not sure who that is?) are in the minority, so they would have to be the ones asking for an explicit update.
because we will then get even more “why ]up doesn’t update my pkg”, imagine you only have apt upgrade not update but somehow the default behavior doesn’t contain update and user has to sudoedit /etc/apt/blah... before running sudo apt upgrade.
I’m fine if we add a command called registry freeze but I think when user ]up, they expect to download artifacts etc. for newer versions of their packages anyways.
I would suppose you are running Windows, where registry update is fairly slow as of now because of Defender interaction. Should be much faster (=similar to Linux) with 1.7 as there will be no registry extraction - so, you could try 1.7.0-beta2 to see if it is better for you?
Aside from the improvements coming in 1.7 for Windows users, it still seems to be a valid concern that so much network traffic is generated just to force registry updates (I didn’t know it was on the order of 100 MB, at least). That seems excessive if all one wants to do is add one small package (at least, for me, it’s seldom a priority to be on a sub-one-day bleeding edge registry version; and if it is, I can do registry up). Summed up across all Julia users, that’s a lot of data and energy.
I also think some sort of time-out option would be meaningful.
Good guess, but no, I’m using Linux, but with ZFS, and I heard or read somewhere that ZFS and the unpacking of the registry does not cope well. So, Julia 1.7 should improve that, too. I will give it a shot as soon as the release comes out.
Note, however, that my main concern was the network usage, not the speed. Still, thanks for the insight!
I recently ran into this problem while working on a ship with a satellite internet connection. Just updating the packages in a local project (to versions I’d already downloaded and installed in the global environment) triggered a re-download of the whole registry, which was a huge PITA. Having a simple option like registry freeze, or just documentation of @mcabbot’s workaround, would be great.
I think part of the problem is that registry updates currently requires O(# of packages) communication. I think we could lower that to O(# of outdated packages) by having the client send when they last updated the registry.
If I remember correctly, at some point someone looked into transmitting only a diff of the registry, but probably before the recent work about keeping the registry in a tarball
I don’t think a diff is the right answer. That would add a ton of work server side. I think the easiest thing that will get the same benefits is having the package server have separate tarballs for today, this week, this month, and all time. That way we don’t have to calculate diffs on the fly, but still get 90% of the benefit.
Is there any issues with just updating the registry at most once a day ? It seems the only people that would need more frequent updates are the one actively registering packages and testing them out, but in that case they can just do a ]up.
@staticfloat and I are already working on this. The rough plan is still what was outlined in the original Pkg protocol issue. Diffs between the registry tarball that the user has and the one that they’re downloading will be computed with the BSDiff package, which has the ability — absent from the similarly named command-line tool — to generate an index (it’s a suffix array, to be technical about it) for each old file that speeds up generation of diffs for different new files. Generating an index for a registry tarball takes several seconds but once you have an index for the old version, generating a diff with any given new registry tarball takes about half a second. We will cache both index files and pairwise diffs, on the premise that both will have good temporal cache locality: if one person is upgrading from a given registry version, chances are many people will be upgrading from that same registry version; if one person upgrades from registry version A to B, chances are other people will need the same diff. Serving a cache hit for an exact registry diff will be basically instantaneous and the diffs are very compact.
There are lots of tricky details, but it has pretty much all been worked out at this point. For example, you want to minimize diffs by using a stable, consistent tarball format that orders content consistently and doesn’t capture a lot of irrelevant details that change arbitrarily, like timestamps, user/group IDs, detailed permissions beyond what git cares about. That’s one of the reasons I created the Tar package, which is now a stdlib and is used to generate standardized tarballs that are served by Pkg servers — it does all of that by design and more: if two trees have the same git tree hash, then if you generate tarballs for them, those tarballs will be the same. This actually isn’t necessary for diffing registry tarballs that we don’t extract, but becomes important if we’re going to be able to use diffs for things that we do extract, like packages and artifacts. (Because you need to be able to reconstruct the old tarball in order to apply a patch to it.)
The diffs created this way are quite tiny, so they should massively help in situations where someone is on a slow connection. With that change I highly doubt that it will be necessary to throttle registry updates (they’re already throttled to once per Julia session). We could potentially still add a “no more often than X minutes” limit on updates if someone cares to implement that.
I’ve got my home directory on an NFS server which is served by a glusterfs cluster. On glusterfs stat is not a trivial operation. The first time I built the registry in my home dir it took a LONG time, on Linux. Could have been an hour. I haven’t tried to do PKG installs yet but I suspect it’s nontrivial. The registry filesystem tree is quite large and this means a lot of time to traverse it. I realize this is a bit of an edge case.