Publish packages without complete commit history

First the question itself: is there a way, currently existing or planned, to publish/register a package without making the whole commit history available to everyone? Basically, something like pypi does: it only uploads and stores specific “releases”, not every commit.

The motivation is that I just cannot become comfortable with every person in the world seeing all edits I make to all of my projects, with exact dates/times/etc. If this happens on github, there are even notifications and a feed listing all activity in details. I guess there are other people who share similar views as well.

So it would be useful if the Pkg system had some kind of a central storage where one can upload package releases with an explicit command not tied to git commits in any way. Now that Pkg is becoming less and less reliant on github, this seem technically possible. This would allow developing packages in any kind of private/local repos, and only publishing specific versions when the author sees fit.

The lack of such feature is almost the only reason I maintain a “private registry” with my packages that are currently used either by just me or by a few of my colleagues. Maintaining is really easy thanks to the LocalRegistry.jl, but still it could be useful to make those packages available to everyone in General.

1 Like

You can have a public repo that only contains the releases and keep the rest private.

4 Likes

I don’t see how this is a problem if your code is actually open source.

Keep in mind that history is especially useful in understanding code, eg sometimes a particular line makes more sense in the context of a PR, etc.

“Open source” just means that the package code is available and a user can do whatever he wants with it. This is very different from publishing the complete history of the code with exact timings of all edits. It’s basically the difference between saying “currently I’m at home in front of my PC” vs “I’ve been at home at these times: XX-XX, XX-XX, …”.

Suggestion by @fredrikekre looks reasonable, although requires a significant overhead in managing two git repos.

1 Like

I also don’t hugely see the value, but I can imagine other people have more reasons to be privacy concerned than me.
Which is their right.

I found an interest piece about the option of using enviroment variables to control what git logs as the commit time.
https://lebenplusplus.de/2017/01/28/how-to-protect-your-privacy-by-changing-your-git-commit-times/

It seems like it could be fairly reasonable to make a wrapper script for git that says that all commits are made at 1 second intervals at 0:00UTC on the day of them being made.

1 Like

Ok Pkg itself.
We can ignore for now the Pkg server stuff, since that assumes it acts primarily as a caching layer, and there is still some canonical source it goes to get from if it doesn’t have it.
(Maybe that can change, but not what i want to comment on right now.)

First thing to note is that the Registry itself contains nothing of your git history.
What it contains is a repo URL
and a list of git-tree-shas for each version
plus some stuff for sorting out dependencies.
Its notable that it is a Tree-Sha, not a commit sha, so it is ties only to the content of the release, not to the actual git history assoicated with that.

According to Stefan Karpinski, Pkg itself by design is supposed to not be too tied to git.
Such that it should be fairly reasonable to swap out to another version control.
(I was considering if it would be viable to make Pijul work at one point)

Pkg.add (rather than dev) is even less tied to git already.
(again ignoring the Pkg server stuff):
Using git is the fallback for Pkg.add.
Before resorting to that it checks if the repo URL (in the registry) is for github.
(*which is a bit gross special casing GitHub like that, but it got us performance that was really wanted for 1.0 release. If other providers had similar API then they should be added).
If it is, then it doesn’t touch git at all.
It does a direct fetch from the URL given by:

"https://api.github.com/repos/$(repoowner)/$(reponame)/tarball/$(tree-hash)"

It seems like it would be fairly feasibly (technically, not nesc socially),
to have a similar special case for a server that can host arbitary content addressed tarballs.
Seetting repo to:
http://www.HostingServer.com/$(repoowner)/$(reponame)
and having that trigger a download to
http://www.HostingServer.com/$(repoowner)/$(reponame)/tarball/$(tree-hash)"

though since it would not be a git repo operations like dev that do reley on it being being a git server will break.


But anyway to loop back round and with relevents to Fredrick’s suggestion of a public repo that only contains releases:
because it is only depending on the tree-hash, that means you just need to have the content the same to get it to work for download if on github.
SO i think
So you could have a private repo that has full history,
and a private package registry that points to it.
and a public repo only has releases, which you insert without any history preserved – like delete all files, make a commit, add all files from the release, make a commit.
and the neat thing is, you would be able to directly copy paste the Versions.toml file from your private registry to the public one.
All the tree-sha’s would be the same.
(only the Package.toml would need to have the URL changed)

Or different branches for detailed history and for release commits. Then do not rebase them - just merge.

Interesting, thanks for the in-depth explanation!

Looks like the simplest solution that doesn’t even require any Pkg modifications is indeed making a separate repo that gets populated only when releasing a new version. With a small script I think even the commits can be preserved, erasing only their datetimes.

If that’s the only concern, you can obsfuscate/strip timestamps, and push to Github via a cron job. Eg see

and similar discussions.

1 Like

… and since you can do this quite easily noone can really say what time you commited anyway.