TagBot size limitation?

After merging a minor update and attempting to tag DynamicHMCModels.jl it turned out TagBot tag fails, e.g. see the tagbot result because the size of the repository is too big (>512MB)…

One reason is that I have been storing too many notebooks (and versions of these?). I’m considering git filter-branch operations on these repositories to remove a bunch of big files (which have been (or will be) dropped in version 1.x of these packages).

Has anyone else dealt with this problem? Or knows a good solution?

My understanding of git is rather superficial. My current WIP approach is based on this work.

Rob

It looks like the limitation is specific to the GitHub App form of TagBot.

Have you tried switching to the GitHub Action form of TagBot?

Cc: @christopher-dG

On a separate note, instead of checking in notebooks, you might consider using Literate.jl. You check in plain Julia source files, and Literate generates the notebooks for you.

Just to add that if you do choose to switch to Literate.jl as @dilumaluthge mentioned, you could look at this PR where I recently made that switch for Convex.jl’s examples. The docs/make.jl file generates the notebooks from the literate files, adds a Project.toml and a README, and then zips it up so we can serve that bundle from the documentation site. We also convert the literate files to markdown and serve them directly via Documenter, so they show up rendered on the docs site without needing to open a notebook (see the examples in sidebar here: Home · Convex.jl).

1 Like

Hi! Dilum is right, the way around this is to use GitHub Actions as described here.

1 Like

Thanks for the answers/suggestions! I’ll switch to Github Actions.

In the long run I would like to reduce the size though. It’s not extremely slow, but certainly not snappy either when installing the packages.

I’ll certainly look at the Convex.jl approach. From your description, the bundles are separate Github repos which can be “installed”, ready for Jupiter? That’s a cool idea. On the .md files, doesn’t the Github pages branch contribute to the overall repo size? Clearly I need to study this more as it might also be applicable for another project where many (separate, different customers) cases have to be analyzed using FE methods!

I do use Literate.jl to generate notebooks and .md files so I’m not sure how that can further help me. On it’s own I think Literate.jl is not a problem, but in my case I have big graphics (sample chains etc.) in there and I think these are the root cause.

Indeed, switching from the Github App to the Github Action for TagBot worked great. Provides a more gradual way to tackle the size problem and study how Convex.jl is structured. One solution could be to generate the notebooks and not execute them.

Again, thanks for the help!

2 Likes

I’ll certainly look at the Convex.jl approach.

I hope it helps! I based my approach a bit off of TimeseriesPrediction.jl’s approach to examples, so that could be another package to look at.

From your description, the bundles are separate Github repos which can be “installed”, ready for Jupiter? That’s a cool idea.

For Convex.jl, I ended up just putting all the different examples (which are in separate literate files) together in one zip file after converting them to notebooks. So everything is in the Convex.jl git repo and there’s only one zip file. But with the Project.toml file and Set JULIA_PROJECT="@." for kernel by davidanthoff · Pull Request #820 · JuliaLang/IJulia.jl · GitHub, if one starts Jupyter from the directory of the unzipped folder, the environment should be right so you don’t have to add the dependencies (just instantiate). For more involved setups something similar could work for groups of examples.

On the .md files, doesn’t the Github pages branch contribute to the overall repo size? Clearly I need to study this more as it might also be applicable for another project where many (separate, different customers) cases have to be analyzed using FE methods!

Ah, probably it does. Repo size wasn’t really something I was thinking about for Convex.jl, just simplifying diffs from PRs (it’s easy to review PRs to literate files compared to PRs that change notebooks).

One solution could be to generate the notebooks and not execute them.

That’s what I’m doing now for Convex.jl. I’m still executing the code via Documenter in the markdown files (so the examples get rendered in the docs), but not executing it for the notebooks. I’d actually prefer to execute in the notebooks as well, since our files aren’t too big, and that way Travis will fail the docs build stage if the notebooks fail to execute. However, it already takes 20 minutes to build the docs on Travis due to executing all the examples when building the docs, so I didn’t want to push the build time up more by also executing the code in the notebooks. Anyway, docs build time is also something to think about if you’re considering putting the examples in the docs like we did with Convex.jl. It’s much slower on Travis than it is locally, I think partly since it needs to install and precompile all the packages each build.

One thing I thought was really cool about putting the examples in the docs is clicking the “edit on github” links on any of the examples take you to the literate file itself (thanks to a nice feature of Literate.jl). So if someone spots a typo or other change they want to propose, it all should be much easier to fix now (click edit, make the change in the github UI, send the PR with a nice plain text diff) versus before (fork the repo, boot up IJulia, open the notebook, make the change, PR with the notebook diff).

Thanks Eric, this helps a lot!

1 Like