Packages clean up, general .julia data consumption

The following complain is a bit speculative because of unsufficient information. I am happy to be corrected.

I have several Julia versions installed (Windows 10) and started some cleanup, which turned out problematic or impossible.

The current layout of the .julia folder contains all packages and artefacts of all installed Julia versions, which can’t be destinguished from just the filesystem as they aren’t separated by e.g. folders named as the version of Julia it belongs to. The different package versions are separated in folders named “o35i5”, “zCmTA”,… the artefacts with folders like “01724eb9b7b9ddb287f7edd01bfeb1c8d64ed849”.

So, a quick view into the excellent Pkg documentation and cleaning up is provided, e.g. for v1.4 as:

using Pkg
using Dates
Pkg.gc(;collect_delay=Dates.Day(0))

(side note: despite excellent, I tried with Pkg.gc(;collect_delay=0) which doesn’t work, documentation should be a bit more specific)

The garbage collection made some GB free.

After cleaning up my old versions (1.0,1.4,1.5.0) and uninstalling them I made a total free space of 8GB from 24GB in .julia, so still 16GB with 1.5.3 and 1.6 installed, where 1.6 is nearly unused.

My speculations now are:

  • there is still some GB left from old versions
  • just uninstalling the old versions would have left all the data in .julia
  • Pkg.gc() run from a specific Julia version would never touch data from other versions
  • If users uninstall an old Julia version they will typically not running Pkg.gc() beforehand

It seems that using Julia over time will inevitable fill space with never be used again data because of Pkg.

Just removing the complete .julia is not a viable option, because I would also remove folders dev and environments and risk loss of some work, which is not synced with master repos (if not just local). There are other folders like prefs which may also mean data loss if removed. I could just remove folders packages and artifacts and reinstall, but this would still leave data corpses like e.g. conda.

I would prefer a clear visible separation according to the Julia version inside .julia. So, when I uninstall Julia version 1.0 I can just remove e.g. .julia/v1.0 without having to care about what is inside. Of course e.g. dev needs to be somewhere else. A clear separation between user generated and managed data like in dev, which can not be restored if removed, from installed data from external sources, which can easily be reinstalled, should be the first layer of separation. The second layer of separation would be inside the installed data: the Julia versions should also be visible to be able to remove unused packages of uninstalled Julia versions without the need to reinstall everything for the remaining Julia versions used.

3 Likes

There are only two things which are version-dependent, and you have version-specific subdirectories, as you like it: compiled/ and environments/, all the rest is shared across multiple versions.

Yesterday I wrote a quick-and-dirty package to help cleaning up active manifests and artifacts: PkgCleanup.jl

11 Likes

This depends on the definition.

A package X version V installed from Julia 1.4 should be separated from the identical package X.V installed from Julia 1.5

The reason is, that I would prefer duplicated data as long as I use both duplications and that I can easily get rid of data I know I don’t use anymore.

Of course this is a matter of taste.

With that I could easily try out changes in a package which is not working anymore in e.g. 1.6 without changing the same package for my production version 1.5.3.

Again of course this is a matter of taste and there are other mechanisms for this special task, which I would have to look up now. Which shows why I prefer it separated, because it’s much easier.

To put another argument into the discussion:

If I just uninstall all versions of Julia and remove complete .julia/VersionDependend I would like to have a clean system without anything in .julia which wasn’t created by myself (like currently dev, but e.g. not compiled).
This can also be translated into: I don’t trust the current system that it stays clean over time without producing old data corpses.

Not sure what you mean honestly: the separation is reflected in different environments and compiled files. Why do you want to download twice the same source code?

2 Likes

Did another

using Pkg
using Dates
Pkg.gc(;collect_delay=Dates.Day(0))

in 1.5.3 and nothing removed.

Removed all version folders of uninstalled Julia versions in environments and compiled and did another

using Pkg
using Dates
Pkg.gc(;collect_delay=Dates.Day(0))

and it removed more 3 Gbyte .
Nice, that means the general assumption that

  • Pkg.gc() run from a specific Julia version would never touch data from other versions

is wrong. (Personal trustlevel on Pkg increasing)

I gave already a scenario (editing a package which doesn’t work anymore in a new Julia version) which would be much easier than whatever the right way would be.

And again, because I suspect, that Pkg will produce unused and obsolete data inside .julia in Gigabyte magnitude. This would be no problem if I could just go in and delete the obsolete data because I know, that I already deinstalled Julia v1.4. If I try this now, I only find compiled/v1.4 and environments/v1.4. If I delete them, this would make about 500MB instead of the 1-2GByte, which are really used by the packages and artefacts. Another Pkg.gc() is needed and you need to know this. This is not easy and not transparent, it needs deeper inside knowledge of Julia just to clean up space on disk. To cleanup disk space it should be sufficient to remove the data. For this to be possible, it needs to be clearly visibly separated, which is not.

As I said, it’s a matter of taste.

Instead of forcing me to explain my taste again and again, I would be more interested in your taste, which is obviously different from mine, but I don’t know why. Don’t you bother with GB obsolete data?

Let’s construct an analogous application: you install an App called ImageManipulation which allows you in a high performant way to manipulate pictures. For this it creates a folder .ImageManipulation. You do some work with this and are satisfied with the results. So you uninstall the ImageManipulation, no problem .ImageManipulation stays in place. Looking for your results in this folder you can’t find anything because it uses a proprietary high performant database. Ok, reinstalling the App, all is fine, everything is there. You can’t get rid of the folder .ImageManipulation, which is several GB big, despite that your picture is just a few MB. Ok, I can just export my result picture and save it somewhere. Well, at this point the imaginary App becomes better than Julia, because in Julia I can’t just export the results of my work (in dev) to some place outside .julia. (Try to understand the idea, please don’t nitpicking the weakness of this analogy. I know I can just copy dev away and remove .julia).

2 Likes

You are assuming it’s obsolete, but if you remove the obsolete active manifests the data will be cleaned up by Pkg.gc. If you keep the active manifests up to date, there is no obsolete data. Hence why I wrote PkgCleanup.jl. I activate local environments a lot (and even more since I discovered ]activate --temp, which luckily doesn’t leave garbage after restart of the machine). I have many Project.toml files scattered on my system for some tasks I did time ago. I don’t want to delete those files, maybe in the future I’ll come back to that, but if I do want to cleanup packages/artifacts requested by that environment, I just remove it from the active manifests.

1 Like

Yes, because the environment/ directory isn’t really any special: it’s just an environment which happens to be the default one for a specific version of Julia. You delete it (or remove from the list of active manifests) and Pkg.gc will clean all package/artifact installations that were required only by that environment. But the same happens with any other environment in your system, Pkg.gc doesn’t really care, nor know, which Julia version installed what.

1 Like

I tried PkgCleanup.jl and somehow it feels like it proves my point. First, I like it, it’s a good addition to Pkg.gc(). But:

PkgCleanup.artifacts() doesn’t help too much, because I don’t know which artifacts are still needed by some package.

PkgCleanup.manifests() is the tool to use, but what to do with:

   [X] C:\Users\oheil\.julia\packages\Colors\r1p4Q\Manifest.toml
   [X] C:\Users\oheil\.julia\packages\Lazy\mAoZN\Manifest.toml
   [X] C:\Users\oheil\.julia\packages\MacroTools\jYLA1\Manifest.toml
   [X] C:\Users\oheil\.julia\packages\SortingLab\LKM1T\Manifest.toml

What happens if I deselect them? I don’t know so I don’t do it.

You will explain it to me, but I just wanted to cleanup my disk space and you see where we are now?

I just wanted to remove files in .julia which belong to v1.4 from the windows explorer, because I deinstalled Julia 1.4. Actually I think this is reason enough for making this possible, not because I want this, because I assume there are others, too, who want to cleanup easily AND securely.

But no, I now have installed another package.

(I sound angry, I am not, I find that amusing :slight_smile: )

1 Like

Yeah, PkgCleanup.manifests() is more useful: artifacts mostly depend on active manfiests. If the package containing the artifact is garbage collected, the artifact is gone as well.

As a rule of thumb, you want to remove:

  • old julia environments (the ~/.julia/environments/v1.x/Manifests.toml ones) you’re confident you aren’t going to use anymore
  • any manifest outside of your Julia depot ~/.julia which you’re confident you aren’t going to use anymore. These are the more easy to miss if you use local environments a lot and that can cause garbage which will never be otherwise collected.

The packages manifests in ~/.julia/packages usually depend on something else requesting them (it’s a bit unlikely you ever explicitly instantiated C:\Users\oheil\.julia\packages\SortingLab\LKM1T\Manifest.toml).

Then delete ~/.julia/environments/v1.4 and ~/.julia/compiled/v1.4 (plus Pkg.gc(;collect_delay=Dates.Day(0))), that’s it. I thought it was clear from my first message?

1 Like

IMHO, the “never” is a improvable default here. Although I generally understand why we default to not touching packages that are used by a manifest anywhere on my computer, I think that deleting those would be fine as well after a longer time, say, for example, 1-2 month. At least I would like to see a gc option for this, say, gc --ignore-old-manifests.

That’s a fair and reasonable suggestion.

To be clear, the “never” is my understanding of how Pkg.gc works, but perhaps Pkg devs can correct me if I’m wrong. However, the first time I realised I had tons of very old environments in ~/.julia/logs/manifest_usage.toml, the following Pkg.gc would remove like 900 package installations. I don’t want to delete the environments, but also don’t want them to keep packages forever.

@oheil perhaps the difference between your use case and mine is that I tend to have multiple Julia installations at the same time (I currently have 1.0, 1.3-1.7): reducing duplication for me is invaluable:

% du -hsc .julia/*
67G     .julia/artifacts
12K     .julia/bin
5.2M    .julia/clones
607M    .julia/compiled
3.6G    .julia/conda
8.0K    .julia/config
204M    .julia/datadeps
5.7G    .julia/dev
120K    .julia/environments
1.1M    .julia/lib
5.0M    .julia/logs
25M     .julia/makiegallery
212M    .julia/packages
4.0K    .julia/pluto_notebooks
20K     .julia/prefs
197M    .julia/registries
5.2G    .julia/scratchspaces
12K     .julia/servers
83G     total

Imagine having my artifacts directory multiplied by 5 (all versions between 1.3 and 1.7). Pkg is already very smart because it tracks the usage of packages/artifacts/other. I don’t think having entire installation split by minor version is a good idea. It was the case until v0.6, and keeping the same package across multiple versions was a painful symlink dance

2 Likes