Package with large artifacts - CI runners out of disk space

Hi there,

I’m working on a package that makes a new financial dataset, model and model activations available. The latter are quite big and I’m trying to make them available as Arrow files through artifacts (see here).

Unfortunately, they are so big that the CI runners run out of disk space: Activations to artifacts · pat-alt/TrillionDollarWords.jl@1a55e83 · GitHub. They are still small enough though to facilitate model probing in memory, which is quite neat and I’d really like to add that functionality to the package.

What’s the best way to go about this? Can I avoid downloading all artifacts during CI? Not ideal, but they are standardized, so as long as downloads work for one layer of activations, it’s reasonably safe to assume they will work for all layers.

Thanks!

1 Like

Quantify “quite big”.

Github-hosted runners now have 150 GiB of local storage, it’s quite remarkable if you managed to run out of space there because of artifacts (but see also the request above).

Artifacts can be lazy, which means can be downloaded on-demand when you need them. Sounds like this is what you want to do?

5 Likes

Thanks for the quick response @giordano.

They are about 300MB each and there’s 24 of them, so it adds up but definitely not to 150gb.

This is what I had missed and really should have done in the first place, thanks!