@staticfloat and I are already working on this. The rough plan is still what was outlined in the original Pkg protocol issue. Diffs between the registry tarball that the user has and the one that they’re downloading will be computed with the BSDiff package, which has the ability — absent from the similarly named command-line tool — to generate an index (it’s a suffix array, to be technical about it) for each old file that speeds up generation of diffs for different new files. Generating an index for a registry tarball takes several seconds but once you have an index for the old version, generating a diff with any given new registry tarball takes about half a second. We will cache both index files and pairwise diffs, on the premise that both will have good temporal cache locality: if one person is upgrading from a given registry version, chances are many people will be upgrading from that same registry version; if one person upgrades from registry version A to B, chances are other people will need the same diff. Serving a cache hit for an exact registry diff will be basically instantaneous and the diffs are very compact.
There are lots of tricky details, but it has pretty much all been worked out at this point. For example, you want to minimize diffs by using a stable, consistent tarball format that orders content consistently and doesn’t capture a lot of irrelevant details that change arbitrarily, like timestamps, user/group IDs, detailed permissions beyond what git cares about. That’s one of the reasons I created the Tar package, which is now a stdlib and is used to generate standardized tarballs that are served by Pkg servers — it does all of that by design and more: if two trees have the same git tree hash, then if you generate tarballs for them, those tarballs will be the same. This actually isn’t necessary for diffing registry tarballs that we don’t extract, but becomes important if we’re going to be able to use diffs for things that we do extract, like packages and artifacts. (Because you need to be able to reconstruct the old tarball in order to apply a patch to it.)
The diffs created this way are quite tiny, so they should massively help in situations where someone is on a slow connection. With that change I highly doubt that it will be necessary to throttle registry updates (they’re already throttled to once per Julia session). We could potentially still add a “no more often than X minutes” limit on updates if someone cares to implement that.