Package with large artifacts - CI runners out of disk space

pat-alt · January 18, 2024, 2:44pm

Hi there,

I’m working on a package that makes a new financial dataset, model and model activations available. The latter are quite big and I’m trying to make them available as Arrow files through artifacts (see here).

Unfortunately, they are so big that the CI runners run out of disk space: Activations to artifacts · pat-alt/TrillionDollarWords.jl@1a55e83 · GitHub. They are still small enough though to facilitate model probing in memory, which is quite neat and I’d really like to add that functionality to the package.

What’s the best way to go about this? Can I avoid downloading all artifacts during CI? Not ideal, but they are standardized, so as long as downloads work for one layer of activations, it’s reasonably safe to assume they will work for all layers.

Thanks!

giordano · January 18, 2024, 6:22pm

Quantify “quite big”.

Github-hosted runners now have 150 GiB of local storage, it’s quite remarkable if you managed to run out of space there because of artifacts (but see also the request above).

Artifacts can be lazy, which means can be downloaded on-demand when you need them. Sounds like this is what you want to do?

pat-alt · January 19, 2024, 6:52am

Thanks for the quick response @giordano.

They are about 300MB each and there’s 24 of them, so it adds up but definitely not to 150gb.

This is what I had missed and really should have done in the first place, thanks!

Topic		Replies	Views
Using Artifacts for test data General Usage question	4	640	October 25, 2022
New Artifact System, Data, and OneDrive General Usage	12	1487	January 18, 2020
Workflow for using package artifacts Package Management artifacts	1	445	March 2, 2022
Pkg artifact download when lazy=true New to Julia	0	254	August 3, 2020
My experiences using Pkg.Artifacts for test data General Usage	1	565	March 25, 2020

Package with large artifacts - CI runners out of disk space

Related topics