Package size crazziness

Prelude I. I do everything in my laptop where disk size is far from infinite (and cost for SSD disk are not low).

Prelude II. I work on Windows

I made an exercise to confirm one suspicion about the size of packages installed with dependencies dealt by the BinaryBuilder. For that I installed the GDAL, which is a program that I happen to know a bit off (I mean, the C lib).

With not much surprise my suspicious were confirmed. After installing the package its disk usage was 500 Mb. A fifth of it (~100 Mb) are due to installing files (*.tar.gz) that are not removed after being uncompressed. But the largest chunk of the total size are to blame to mingw build. I do build GDAL with Visual Studio and the total size is about 25 Mb. So from ~25 to ~400 Mb (the total 500 less the non deleted ~100 Mb tar files) goes an awful difference.

Don’t know the solution to this, but to pretend that mingw builds are a good replacement to VS on Windows … guys take care.

I have to admit, the tone of your post really bothers me. In particular,

Don’t know the solution to this, but to pretend that mingw builds are a good replacement to VS on Windows … guys take care.

seems extremely dismissive (and rude) to the people who are working hard to provide a working ecosystem. No one is “pretending” anything. Binary packaging is difficult and completely thankless work, and providing working cross-platform binaries is even more so.

None of that is to suggest that you can’t complain about it. Just please remember that this is a best effort from people who care about getting it right.

8 Likes

I apologize if that was the impression that I passed. Really, no mean to be rude or neglect the work of others. But I feel that this aspect of the things on Windows tends to be easily overlooked.

No worries :slightly_smiling_face:

You might have more luck over in the #bindeps2 channel on Slack, which seems to be pretty lively. Removing the extra downloaded tarfiles seems like an easy win, although switching from a mingw build to VS sounds a lot scarier. Is storage space the only reason to prefer the VS build? I realize laptops are pretty constrained, but, to be fair, 500 Mb worth of SSD costs about $0.09 USD (e.g.) these days so it’s not as bad as it once was.

For my own builds a far more important reason is debugging. VS builds are a breeze to debug whilst mingw … I don’t even try.

But don’t minimize the size issue. Of course 500 Mb is a nothing but multiplied by many packages. And the next time I upgrade my disk size, it will come with a new laptop attached (my current one is 2014 MacBook Pro, which was not cheap at all).

BTW, did you know that if one get distracted and let one package that depends on Conda (and have not set the right env var in case one already have python installed) will get a nice 1.7 Gb Conda installation with tons of things that will never be used by that package?

2 Likes

Oof, yeah, that’s a lot.

That’s easy, install Linux :wink:

1 Like

It could be easier on Windows 10, use docker to run julia.

Given the fact that VS is requiring:

  • Hard disk space: up to 130 GB of available space, depending on features installed; typical installations require 20-50 GB of free space.

the 400MB are not too bad of a compromise to start with :wink:

I mean, if you require VS as a hard dependency, we will end up with users complaining about the 20-50GB. There are people who don’t use/depend_on VS :slight_smile:

Yeah I would love it if we could get the GDAL deps folder down from 500MB on Windows. Maybe simple additions like stripping binaries might help? Any help/tips on how to do that are appreciated on GDALBuilder, (or BinaryBuilder docs).

One thing that will shave off about 20% is no longer having to include the dependencies. I believe that’s on the BinaryBuilder roadmap to handle that.

If a new BinaryBuilder release is out I will try my hand at GDAL 2.3 as well.

I see the sizes are actually more or less comparable across the different operating systems: https://github.com/JuliaGeo/GDALBuilder/releases.

So no need to turn this thread into a Windows vs others thread. But general tips on how to reduce the size of builds will be welcome. This will benefit all. Especially those that don’t have the fortune of high bandwith internet and/or large storage devices.

Ok, I did not check the *nix sizes, but the Windows can be clearly smaller. Reducing the dependencies is not a good decision. Without its dependencies (namely netCDF and HDF) GDAL has a much reduced utility. This site has many Win builds with a lot of GDAL dependencies and the zip file is still about 40 Mb.

But my issue with this type of solutions is that they install the packages (at least on Win) in a place that only Julia knows about, so they wont be useful for use outside of Julia. And, it doesn’t use the fact that the binary dependencies may already be installed in the system.

Sorry to pick up GDAL for this discussion, but as I said in other post this is one program that I know better.

Ho said anything related to requiring a VS installation (who, though big, 3-4 Gb, is still far from 50-50Gb))?

Sorry I don’t mean getting rid of them and their functionality. But currently if you add LibGEOS.jl and GDAL.jl, it will install it twice. Soon it should be possible to share the installation within julia.

That’s true. But as a GDAL.jl author things will be much simpler for me knowing exactly what binaries users are getting, and knowing that they will essentially be compiled the same across platforms. Not the most efficient space wise, but if setup well, things will “just work”, and that’s worth a lot.

No it’s a good example, and you are right that the VS binaries are much smaller. I think it is likely that this is nothing fundamental though, and we can reduce this.

I understand that perfectly. In fact I’m faced with the same issue for GMT, but can’t the build system try to detect if a GDAL is already installed (for instance if gdalinfo runs ok)? The names of the GDAL shared lib change with versions but at least on *nix that is a symlink whose name does not change. And in case it is, than no need to install the GDAL.jl own dependencies.

If we can add that as a non-default option I would be for. But besides testing if gdalinfo works, I also mean that I can now make assumptions on which version is installed, which formats are available, etc. If this would be the default behavior we probably get issues from users that had it working, then uninstalled GDAL, not knowing GDAL.jl relied on it, and have GDAL.jl now no longer working either.

gdalinfo --version and gdalinfo --formats would tell you that. But you are right about a user uninstalling GDAL, but for me people have to have a minimum knowledge of what they are doing.

In the Win64 GDAL library built by GDALBuilder, an easy way to trim size is to remove lib/libgdal.a (159 MB out of 314 MB).

1 Like

True but that then puts an extra burden on the GDAL.jl developer to do these checks. Unless somebody else is willing to put in the work to support different GDAL installations in GDAL.jl, I would prefer to make it possible to “bring your own GDAL”, but then you are on your own on what works and what doesn’t.

I spent too much time already fixing broken GDAL installations on other peoples computers, to get something to work. Most of them don’t know/don’t care what GDAL is.

Very true.

The “bring you own GDAL” would have another potential (big) gain, which is the number of drivers. Currently you only have GEOS and PROJ, but not netCDF, HDF5, etc… However, since you say that you rely on knowing what formats are available I don’t know if this would really be a benefit.