We could have radically smaller Julia; at least 177 MB smaller (even more so with UPX compression), e.g. compiled Julia binary executables, if we want to, by a) dropping stuff and b) compressing the rest more. A lot of what I describe can be done in 1.x without breaking any compatibility, the rest with trivial breaking changes for 2.0.
The question is how far are we willing to go, what are we willing to drop from Julia for a non-default variant of it (later to become the default possibly).
Everything you (or someone else) want dropped from Julia could be added back easily for those that want some missing function.
The standard library has a cost, and it’s greater in Julia, e.g. because of (matrix) math other languages do not provide. Python has also had a big standard library and is drastically reducing it in 3.13 alpha:
- PEP 594 (Removing dead batteries from the standard library) scheduled removals of many deprecated modules:
aifc
,audioop
,chunk
,cgi
,cgitb
,crypt
,imghdr
,mailcap
,msilib
,nis
,nntplib
,ossaudiodev
,pipes
,sndhdr
,spwd
,sunau
,telnetlib
,uu
,xdrlib
,lib2to3
.- Many other removals of deprecated classes, functions and methods in various standard library modules.
- New deprecations, most of which are scheduled for removal from Python 3.15 or 3.16.
All of those can go away:
$ ls -lrSh .julia/juliaup/julia-1.10.2+0.x64.linux.gnu/lib/julia/
[…]
? -rwxr-xr-x 1 pharaldsson pharaldsson 117K mar 1 11:11 libz.so.1.2.13
[…]
-rwxr-xr-x 1 pharaldsson pharaldsson 175K mar 1 11:11 libmbedx509.so.2.28.2
? -rwxr-xr-x 1 pharaldsson pharaldsson 216K mar 1 11:11 libklu.so.2.2.1
-rwxr-xr-x 1 pharaldsson pharaldsson 222K mar 1 11:11 libopenlibm.so.4.0
-rwxr-xr-x 1 pharaldsson pharaldsson 288K mar 1 11:11 libmbedtls.so.2.28.2
-rwxr-xr-x 1 pharaldsson pharaldsson 312K mar 1 11:11 libssh2.so.1.0.1
-rwxr-xr-x 1 pharaldsson pharaldsson 463K mar 1 11:11 libspqr.so.4.2.1
probably not: -rwxr-xr-x 1 pharaldsson pharaldsson 505K mar 1 11:11 libunwind.so.8.0.1
? -rwxr-xr-x 1 pharaldsson pharaldsson 601K mar 1 11:11 libuv.so.2.0.0
-rwxr-xr-x 1 pharaldsson pharaldsson 640K mar 1 11:11 libmbedcrypto.so.2.28.2
-rwxr-xr-x 1 pharaldsson pharaldsson 653K mar 1 11:11 libpcre2-8.so.0.11.2
-rwxr-xr-x 1 pharaldsson pharaldsson 698K mar 1 11:11 libgmp.so.10.4.1
? -rwxr-xr-x 1 pharaldsson pharaldsson 715K mar 1 11:11 libgcc_s.so.1
-rwxr-xr-x 1 pharaldsson pharaldsson 728K mar 1 11:11 libnghttp2.so.14.24.1
-rwxr-xr-x 1 pharaldsson pharaldsson 739K mar 1 11:11 libcurl.so.4.8.0
-rwxr-xr-x 1 pharaldsson pharaldsson 820K mar 1 11:11 libumfpack.so.6.2.1
-rwxr-xr-x 1 pharaldsson pharaldsson 981K mar 1 11:11 libquadmath.so.0.0.0
-rwxr-xr-x 1 pharaldsson pharaldsson 1,4M mar 1 11:11 libcholmod.so.4.2.1
? -rwxr-xr-x 1 pharaldsson pharaldsson 1,5M mar 1 11:11 libgomp.so.1.0.0
-rwxr-xr-x 1 pharaldsson pharaldsson 1,7M mar 1 11:11 libgit2.so.1.6.4
-rwxr-xr-x 1 pharaldsson pharaldsson 2,5M mar 1 11:11 libmpfr.so.6.2.0
-rwxr-xr-x 1 pharaldsson pharaldsson 2,7M mar 1 11:11 libblastrampoline.so.5
-rwxr-xr-x 1 pharaldsson pharaldsson 9,1M mar 1 11:11 libgfortran.so.5.0.0
? -rwxr-xr-x 1 pharaldsson pharaldsson 13M mar 1 11:21 libjulia-internal.so.1.10.2
? -rwxr-xr-x 1 pharaldsson pharaldsson 21M mar 1 11:11 libstdc++.so.6.0.32
-rwxr-xr-x 1 pharaldsson pharaldsson 32M mar 1 11:11 libopenblas64_.0.3.23.so
-rwxr-xr-x 1 pharaldsson pharaldsson 65M mar 1 11:21 libjulia-codegen.so.1.10.2
-rwxr-xr-x 1 pharaldsson pharaldsson 91M mar 1 11:21 libLLVM-15jl.so
-rwxr-xr-x 1 pharaldsson pharaldsson 231M mar 1 11:19 sys.so
The last one, the sysimage, sys.so, will not go away but would be much smaller, by at least half (when I did my experiment to drop LinearAlgebra and more), probably way more.
B.
In addition for what is left (or even if nothing dropped) you can additionally compress:
UPX will typically reduce the file size of programs and DLLs by around 50%-70%
- excellent compression ratio: typically compresses better than Zip, use UPX to decrease the size of your distribution!
- very fast decompression: more than 500 MB/sec on any reasonably modern machine
- no memory overhead for your compressed executables because of in-place decompression
It supports all the (tier 1) platforms Julia does (and more, e.g. NetBSD), with few exceptions (for non-tier-1) seemingly; I see in the code:
throwCantPack("This test UPX cannot pack .so for MIPS or PowerPC; coming soon.");
It has a good license seemingly:
… or (at your option) under the GPLv+2 with special exceptions and restrictions granting the free usage for all binaries including commercial programs …
Historically the sysimage, sys.so, didn’t have any native machine code in it(?), or at least packages didn’t, but now both do, and it’s a large fraction. UPX is a compressor/“packer” for executable machine code specifically (but handles all I believe, meaning e.g. doc sections in the sysimage, that also might actually be just dropped…).
Currently Julia bundles zlib, and it’s a generic compressor, not for executables, x86, or ARM code like UPX supports. Why it or other similar can’t be as good. Historically when packages distributed only source code or half compiled LLVM bitcode then it may have been better.
C.
I think we should go all in: do as minimal Julia as we can since we’re breaking compatibility with this new non-default anyway, though I could see compromising for now (or not) on e.g. regex support, i.e. on e.g. smaller dependencies like 653K libpcre2-8.so. [Julia’s standard lib may also need it, for now.]
Otherwise here are the low-hanging fruit, in order from most payoff:
Historically C had libc
, and separate libm
for math (I believe merged in some platforms, Android? Windows?), i.e. for floating-point math, the basic operators and square root (and more). It made sense when done in software, when memory was very tight. By now, those compile to individual assembly instructions, though maybe not square root. I don’t believe libm
has any (2D) array operations, e.g. not square root of a matrix (though such element-wise operations are possible).
The cost of e.g. *
, /
(and \
) or square roots is not high in Julia, for scalars or element-wise for matrices/arrays, but it’s huge (in code size; and at runtime) when applied to matrices in full, and all operators/methods/functions are generic in Julia (doesn’t mean we need to support all, i.e. for non-scalars with the sysimage/standard lib, i.e. going way above what most languages/libm
do e.g. C and Python that has NumPy separate). The 32M libopenblas64 can simply be dropped without breaking compatibility, in 1.x. It has better alternatives like BLIS.jl and MKL.jl enabled by the 2,7M libblastrampoline.so that could be only kept, but actually it could also be dropped for a minor inconvenience, then you need to add it through a package. The cost of those .so is born by e.g. any GUI app regardless of if the app (matix) math-based or not. OpenBLAS doesn’t work in WebAssembly, so if simply dropping it (or such capability) then Julia would be more cross-platform, work better on the web (for that subset).
Julia has Downloads.download
available, that I would want to deprecate, and it, and Pkg
that uses it indirectly, need libcurl.so (and currently libmbedtls.so. and libmbedx that are being dropped). For many (compiled) Julia programs, e.g. CLI/scripts, also GUI, it’s just NOT needed (still very convenient in the REPL and download
could be kept there; would it be possible to have functions ONLY defined there, but NOT in a general program, running from that REPL?), but almost all uses of the download functionality also need some TLS/SLL library, and OpenSSL is being added to replace MbedTLS. Julia needs to be secure when you ask for downloads (and uploads), but I’m not convinced any download/upload functions (needing moving-target security libs, when done right) should be in Julia itself, only in a separate stdlib, that Pkg depends on (it’s already separate and not loaded by default in the REPL), and that library could be used with using Security
(from the General registry, and it ideally would auto-update), and it would bring in Downloads
(I don’t see much value in non-https download capability in standard Julia) and libssh2.so. [I suppose also the 728K libnghttp2.so, for now, though it seems fully not needed, we want HTTP/3 and HTTP/1.1, HTTP2 is redundant.]
What is 21M libstdc++.so actually needed for? I think it’s only needed for the largest dependency of Julia 91M libLLVM-15jl.so, that CAN be dropped already, well for compiled apps, sometimes (in cases when you don’t need the compiler at runtime). Julia like most projects at languages depends on libc, but a lot of code in and out of Julia ecosystem is not C++ and doesn’t needs its standard library, so it seems libstdc++.so it should only be a dependency of CxxWrap.jl and similar, for when you actually need a C++ JLL package. I suppose it’s a breaking change to drop it, but if done correctly it shouldn’t in effect be, i.e. all would just need to updated to latest CxxWrap that would be made to include it.
What got me thinking this time around was e.g. seeing the title of this new bug: Error in Sqrt for julia > 1.10.2 · Issue #54062 · JuliaLang/julia · GitHub
sqrt(A) is failing when A is non-symmetric with repeated eigenvalues.
I’m for now posting this in General rather than Community or Internal, it can be moved, though I like to hear from the public.
And I was thinking, what, square root has a bug all of a sudden? But it’s not the scalar kind (no worry it’s ok, nor am I, for now, proposing dropping it from Julia), only for matrices.
So when did you last take a square root of a matrix? By dropping it the rest of Julia would have lower bug-density. Some functionality can reside elsewhere requiring using LinearAlgebra
to work (I’m thinking an ENV var would do it impicity for you, or not, to not require breaking 2.0 release).
Actually besides the bug, it has a performance regression “bug” now takes 23 sec. 5x longer than the 5 sec in 1.6:
$ time julia +1.6 -e "sqrt([1.0 2.0; 3.0 4.0])"
I’m not sure why, likely it’s no longer compiled into the sysimage, a good thing, but neither into LinearAlgebra, or a huge regression in the compiler? But both are fast after first use, so I’m not really worried.
I like the “platform” feature of the Roc language, i.e. them selectable. It has e.g. CLI platform (Unix-like, i.e. with filesystem support) “platform” and then alternatives like a web-programming platform, then no filesystem support available there (client side) nor wanted (in the standard library). I suppose Julia running on such, i.e. WebAssembly, could also do without, and I guess then (and only then?) the 601K libuv.so can be dropped. [Web work also benefits from different CG-strategy, eliminate it or defer it, or at least non-multi-threaded GC on but client and server, and I understand Roc to do differently for the Web platform.]
I’m for now posting this in General rather than Community or Internal, it can be moved, though I like to hear from the public.