Someone kindly pointed out to me that my package StatisticalRethinking.jl is severely bloated. It contains more than 1Gb of garbage in the .git/objects/pack directory (multiple copies of a few pretty old iJulia notebook .ipynb files I used in v1, many other v1 notebooks are no longer present).
I have tried to remove those files and rewrite the Github history (basically following many of the recipes available online to do this), to no avail. The last resort would be to delete the repo, create a new one and make the current content of the repo the initial commit (using the new .git subdirectory).
Before doing that I wanted to check if there are any other suggestions. Losing the history is not a major concern “content wise” but I’m not sure about the JuliaHub tools being able to continue with the next version (v4.4.2). I don’t think there are other published packages that depend on StatisticalRethinking.jl except a few StatisticalRethinking Github org Julia projects.
Note that if you rewrite the history by changing the content of release revisions, which would affect their git-tree-sha1, you’d be making your own package uninstallable (if that’s registered)
Have you seen an example of someone merging a history-free new version of a package? Or is there simply no remedy for this?
Petr:
I’d attempted the Git book’s approach but after the rewriting of the git history the file was corrupted. That is when I went online and tried SO suggestions (which are all variations on what the book suggested).
I think it is the rewrite step which does complete after several minutes that corrupts the file (it already warns about bugginess):
rob@Rob-16-MBP-2 StatisticalRethinking % git filter-branch -f --index-filter \
'git rm --ignore-unmatch --cached notebooks/03/clip-02-05.ipynb' -- e0ec1390^..
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...
Maybe the simplest solution is to move the contents of StatisticalRethinking.jl to a new package StatisticalRethinkingBase.jl (as since StatisticalRethinking.jl v2 the package is only intended to support a number of Julia projects).
Over time StatisticalRethinking.jl will become a general intro for the other packages in the StatisticalRethinking Github organization.
I remember that DIfferentialEquations.jl went through something similar years ago due to large PDFs in documentation, or something like that. Maybe @ChrisRackauckas could advise on the best approach.
The best approach is to just never make a repo have that problem . However, note that these days the repo is not downloaded by users, it only sends the release version, and so you don’t really need to worry about the repo’s history for the package usage.
If you want to fix the history for other reasons, well you could do a BFG repo clean and do a PR to General fixing all of the SHAs
This doesn’t really make sense, you cannot (by definition) change the content of a release (because the release is addressed by the content itself). As long as the content exists somewhere in the repo you are fine. Keeping the content available is as easy as e.g. having a tag for each released version. At that point you can do whatever you want with the history of the master branch.
Thanks everybody for the very helpful comments. I’ll do what I indicated above (and similar to what @dilumaluthge suggests).
StatisticalRethinking.jl will remain around as an anchoring point and overall README. This will preserve the stars (@cormullion).
As I did with Stan.jl for the StanJulia Github organization, it will have no additional functions (these will all go to StatisticalRethinkingBase.jl), just GitHub organization type docs and StatisticalRethinking.jl will continue to be used for overall testing (e.g. functionality comparisons between Stan and Turing or showing new options such as the recently released ParetoSmooth,jl, AxisKeys.jl and DimensionalData.jl packages).
As @ChrisRackauckas pointed out (and I didn’t know), the size issue is only a problem if the package is dev-ed.