On potentially expanding or pruning the standard library

GregVernon · September 5, 2023, 6:17am

As I mentioned in one of my comments over in the “What steps” megathread, I find the current standard library a bit odd. In particular, my concerns primarily relate to included/excluded mathematics libraries.

I find it odd that (to my knowledge) neither Python nor C++ contain linear algebra in their standard libraries, users are typically strongly encouraged by their communities to add NumPy or Eigen to get linear algebra functionalities. Both Matlab and Julia include linear algebra in their standard libraries, but while Matlab supplies comprehensive functionalities such as iterative Krylov solvers (cg, gmres, bicgstab), Julia does not. Python, C++, and Matlab all provide special functions (e.g., gamma, erf) in their standard libraries, but Julia requires users to add a non-standard library package (SpecialFunctions.jl).

I suspect, but do not know, that the Julia standard library is meant to just have minimal capabilities necessary to bootstrap Julia - but it seems to me like there’s a lot of functionality in LinearAlgebra that wouldn’t be necessary.

From the context of the “What steps” thread, for the goal of making Julia more approachable for new users who have the alternative options of Python or Matlab, it seems to me that it would be good to either expand the standard library to include high-quality math packages such as SpecialFunctions.jl, Polynomials.jl, Roots.jl, etc. or to move LinearAlgebra out of the standard library (at least the methods therein not needed to bootstrap Julia) and into the care of JuliaMath organization (and then support / simplify installing an entire organization ecosystem).

My personal preference would be to add methods to the standard library, to get a rough equivalent of the base Matlab capabilities. Anyways, just thought it might be worth continuing this discussion and getting the community’s thoughts.

Sukera · September 5, 2023, 6:26am

Generally, a “standard library” in Julia is conceptually just a preinstalled package. The process of making that more a reality is called “stdlib excision”, which means moving the standard library package code out of the main Julia repository and into their own. The goal with that is to make it possible to upgrade standard libraries independently of Julia itself. You can see some past efforts about that here, under the excision label.

LinearAlgebra in particular is a bit tricky, due to the way it (currently, unfortunately) pirates a bunch of stuff to make the Array type from Base work with the regular mathematical operators like * and /.

Why is excision generally favored over expansion? Well, anything that goes into Base or a standard library has to be supported for a LONG time… Making it much harder to remove, fix & rework things that turned out to be not so great. Plus, excision means being able to compile smaller sysimages that truly only contain functionality that’s actually needed by the runtime, compiler & frontend.

Benny · September 5, 2023, 9:05am

How feasible is it to partially excise a library? LinearAlgebra has many unique names that can be in a separate package, while it makes sense to keep method signatures with entirely Base names in Base e.g. *(A::AbstractMatrix, B::AbstractMatrix).

Sukera · September 5, 2023, 9:31am

That would mean either having LinearAlgebra depend on that third party package (meaning it’s not installable without that, at which point it the third party package would be an stdlib itself too anyway) in order not to break existing using LinearAlgebra uses, or make a breaking release in LinearAlgebra removing the functionality (which we can’t really do without full excision, otherwise user code breaks).

vchuravy · September 5, 2023, 2:07pm

Stdlibs also follow the Julia base release process so we can’t do a LinAlg v2 until it is a upgradeable stdlib (and everyone has to write in their compat LinearAlgebra = "1.12").

And that means we can’t remove names from LinearAlgebra.

GregVernon · September 5, 2023, 3:32pm

Part of the concern I have is that when I talk about Julia to new people, they often ask “can Julia do X?” to which there are often a couple answers I find I’m constantly giving:

Julia being Turing-complete, it can do anything computable. But so can Bash.
No, Julia can’t compute \mathrm{erf}(x) nor find the roots of a polynomial. To do that you’ll either need to write your own code or add a third-party package.

The second answer is quite relevant within the context of US corporations with special considerations around cyber-security: aerospace, defense, energy, etc. When I was employed in one of those sectors, part of a Fortune100 company, every Python module, Matlab Toolbox, C++ project, or Julia package not shipped as part of the standard library had to be approved by manager, cyber-security, counterintelligence, and legal – which is a bunch of paperwork and time (can take up to 6-weeks if no issues). Just recently, one of my friends who’s still employed there mentioned that their request for HDFView was denied because cyber saw something about it “being cloud.”

So when a user installs Julia, and then has to file individual requests for things that are considered “basic” functionality by codes like Python, C++, or Matlab, it then feels like Julia doesn’t support those functionalities. Similarly, when the user installs Julia on an air-gapped machine (we had hundreds of these machines) and then has to figure out how to install SpecialFunctions.jl on a machine with no internet or intranet, it feels like Julia doesn’t support these functionalities.

Yes, with Python the user still needs to get approval for NumPy and/or SciPy, but those are “one module, a ton of functionality” types of packages whereas Julia is largely “one package, one focused functionality”. And Matlab’s installer comes with all the toolboxes, which can be installed based on the user’s license – making it a relative breeze to add a toolbox to an air-gapped machine.

Also, when I hear arguments like

[functionalities that go into] Base or a standard library have to be supported for a long time, making it much harder to remove, fix & rework things that turned out to be not so great."

as reasons that polynomial root-finding, gamma functions, Bessel functions shouldn’t be part of “base Julia”, it makes me wonder if Julia is still an unstable language. “How could there be so much uncertainty in the future of Julia that supporting the error function would be an undue burden?” I understand it’s not just one function that would need to be supported, but when compared against Python/Matlab/C++ it comes off as Julia not being mature. Consider that Python’s standard math library math.erf(), even though a “better” method exists in scipy.special.erf(). For the 90% users who just want something that works, Python’s standard library suffices. For the 10% (or less) users that know they need improved performance / functionality they know to go off and get SciPy.

So maybe part of what I’m asking is, could there be a “base standard library” – which contains a minimal set of functionalities for bootstrapping Julia – and a “secondary standard library” that can be safely ignored for small sysimages, but are maintained (perhaps as git submodules) and distributed as part of Julia (with users able to skip installing them).

Sukera · September 5, 2023, 3:45pm

You seem to be under the impression that Julia the language is a wholesale scientific simulation & computation package. That is not the case - just like python itself isn’t. It’s the ecosystem that makes the scientific package, so to speak.

If you or your workplace is willing to sponsor efforts towards that, I’m sure someone from e.g. JuliaHub or SciML is happy to talk to you about that.

mbauman · September 5, 2023, 3:53pm

Yes, this is one of the raisons d’être for JuliaHub. We have an air-gapped version that can help provide the formalisms and governance required for vetting packages… and we are actively working on improving this thanks to a number of major security conscious orgs and companies.

GregVernon · September 5, 2023, 4:02pm

The context for this thread, was the mega-thread about “how do we increase popularity of Julia?” I’m not sure that I’d say I’m currently under the impression that Julia is a wholesale package, but certainly tools like Python and Matlab achieve a significant portion of their popularity from the ease and clarity that their ecosystems provide.

I will say that in the early-mid 2010s I, as non-Julian, I was definitely under the impression that Julia was the “Matlab, Python, Fortran, C++ successor” so I was definitely expecting performance of the latter and ease-of-use of the former. Standard libraries and ecosystem are definitely part of that ease-of-use. The first words of the introduction in Julia’s manual are:

Scientific computing

And later goes on to say:

Julia provides ease and expressiveness for high-level numerical computing, in the same way as languages such as R, MATLAB, and Python, but also supports general programming. To achieve this, Julia builds upon the lineage of mathematical programming languages, but also borrows much from popular dynamic languages, including Lisp, Perl, Python, Lua, and Ruby.

and

The core language imposes very little;

I would argue that the core language imposes very little, but also provides very little. I just remember being very disenchanted with Julia when I spun it up and found there was almost no built-in functionality to do what I wanted and then a piece-meal, maze-like ecosystem that seemed constantly in flux.

So maybe the answer here is addressed by my other post on ease of adding entire organizations’ production packages, or maybe its having a link on the Julia downloads page to “Pre-packaged ecosystem installation providers” like JuliaHub.

mbauman · September 5, 2023, 4:04pm

In my view, this argument goes the other way around: it’s because the Julia language needs to be stable that its included code needs be relatively stationary. And new code tends to need changes in whatever language you write it in. Once you’ve written it as a package, it can be better versioned and maintained separately. We don’t (yet) have a solid way of independently versioning the included standard libraries.

So this ‘typical argument’ isn’t saying that the language is unstable at all, rather it’s a downstream effect of the exact opposite goal.

Sukera · September 5, 2023, 4:11pm

As someone actively working on putting Julia code baremetal onto microcontrollers, I definitely “get” what you mean with a bare ecosystem for lots of domains - and I’d wish only few things more than shedding the “scientific computing” cloak Julia has surrounded itself with. I think Julia has grown to the point that this sort of almost exclusive labelling is becoming more detrimental, because the people writing the “scientific” libraries are by and large regular software engineers.

However, that also comes with the acceptance that in order to gain mindshare in general (as the thread you link is about), Julia itself also can’t be tailor made for one specific domain, such as “scientific computing”. I’m willing to bet that more than 95% of usecases for python you have in mind are not at all possible with just it’s standard library - scientific computing in python more or less lives off of numpy, pytorch and the like. All of these are third party packages.

GregVernon · September 5, 2023, 4:31pm

I 100% agree with this, its largely the reason behind this post and my other post that talks about trying to simplify adding third party packages (by adding an entire org’s production packages).

With Python, there’s at least the idea ingrained in Pythonista’s that

“if you want numerical computing, simply install NumPy. If you want scientific computing solutions, simply install SciPy.”

Whereas with Julia it’s more like:

“if you want numerical computing, add Polynomials.jl, Roots.jl, Interpolations.jl, Bessels.jl, SpecialFunctions.jl, Richardson.jl, QuadGK.jl, FFTW.jl, Combinatorics.jl, NFFT.jl, Cubature.jl, FastChebInterp.jl, HCubature.jl, FunctionZeros.jl, KahanSummation.jl, RealDot.jl, Calculus.jl, IterativeSolvers.jl, LinearMaps.jl, Arpack.jl, Preconditioners.jl, AlgebraicMultigrid.jl, BenchmarkTools.jl, etc., etc.,”

And I just think that this has at least some deleterious impact for new users of Julia. If instead Julia users’ hive-mind had the thought of “Oh, with Julia if you want to do numerical computing add JuliaMath and JuliaLinearAlgebra” then we probably have a solution that doesn’t require expanding the standard library, though maybe we could prune LinearAlgebra to the care of the JuliaLinearAlgebra organization?.

Sukera · September 5, 2023, 4:37pm

I mean, other than BenchmarkTools.jl, I literally don’t use any of those packages, but I’m under the impression that this kind of grouping into metapackages is exactly what organizations like SciML are doing for their subset of functionality. If your message is “there are not enough metapackages” - sure, I get that, but that’s quite a bit different compared to “I want more of my domain in the standard library”.

Perhaps you can get such a metapackage started in your domain?

Palli · September 5, 2023, 6:14pm

We can (and I think should) excise LinearAlgebra (and thus also OpenBLAS) in 2.0, and even in 1.x (I’ve done so privately), see below. We should also excise Pkg (and the REPL) from the sysimage, but make both (transparently) available. This would make for very much smaller compiled apps, and allow faster benchmarks, we’re currently blocked from top-1 place on some, because of this.

We’ve excised some stdlibs already. What does that mean exactly, could we have LinearAlgebra 1.12 bundled with, and everyone gets that with Julia 1.12, 1.13 etc. implicitly, but if you add it to your Project.toml you would get a potentially later version, e.g. v2 (is there a need for breaking changes for it only?)? Why would people need to add anything any compat if 1.12 would always be implicit, unless you do that? [I guess the reason could be that one of your dependencies wants version 2, but other would not, or your main code, do we want at some point to allow different concurrent versions of same dependency?]

Right, that’s one idea, and yes, none of it is necessary. Neither really floating-point capability (possibly a few exceptions…). I see MATLAB has Page-wise left and Page-wise right matrix divide added “Since R2022a”, under “basic arithmetic/divide” (I’m not sure its basic, and it’s multilinear algebra I believe, same with recently added tensor functionality), so we don’t have feature parity with non-toolbox MATLAB, and will never have feature parity with MATLAB and its toolboxes (in the standard library; we might be there already with the ecosystem).

Note just those and +, -, and ^ and element-wise, but also \ which is also used for LinearAlgebra:

/ Solve systems of linear equations xA = B for x

\ Solve systems of linear equations Ax = B for x

I would like to know how much code this (and related) is (at least OpenBLAS is huge), it’s non-small unlike element-wise +, -, *, for scalars or element-wise for arrays.

We have syntax to construct n-D Arrays, and I believe without using LinearAlgebra you only have those basic operators. When I excised LinearAlgebra I lost that, until I did using LinearAlgebra and too me that small breaking change would be ok, but it also seems trivial to support those operators without OpenBLAS. At least without / and , with e.g. a naive matmul. To me it seems like a good trade-off, and if you want faster you opt in to that and the rest. Then you only have a performance regression, but not really for small matrices, so not even technically breaking (for API).

FYI: What is being added to the standard library seems to be HAMT, i.e. for PersistentDict. (I believe something similar is in Air.jl but this high-performance):

github.com/JuliaLang/julia

add PersistentDict based on a HAMT

JuliaLang:master ← JuliaLang:vc/persistent_dict

opened 06:30PM - 02 Sep 23 UTC

vchuravy

+527 -0

Split out from #51066 for independent review and merge. Prototyped in https://g…ithub.com/vchuravy/HashArrayMappedTries.jl for #50958. The implementation is based on a [Hash Array Mapped Trie (HAMT)](https://en.wikipedia.org/wiki/Hash_array_mapped_trie) following [Bagwell (2000)](http://infoscience.epfl.ch/record/64398/files/idealhashtrees.pdf). A HAMT uses a fixed branching factor (commonly 32) together with each node being sparse. In order to search for an entry we take the hash of the key and chunk it up into blocks, with a branching factor of 32 each block is 5 bits. We use those 5 bits to calculate the index inside the node and use a bitmap within the node to keep track if an element is already set. This makes search a `log(32, n)` operation. Persistency is implemented by path-copying. When we insert/delete a value into the HAMT we copy each node along the path into a new HAMT, all other nodes are shared with the previous HAMT. A noteable implementation choice is that I didn't add a (resizeable) root table. Normally this root table is dense and uses the first `t` bits to calculate an index within. This makes large HAMT a bit cheaper since the root-table effectivly folds multiple lookup steps into one. It does hurt persistent use-cases since path-copying means that we also copy the root node/table. Importantly the HAMT itself is not immutable/persistent, the use of it as part of the `PersistentDict` is. Direct mutation of the underlying data breaks the persistentcy invariants. One could use the HAMT to implement a non-persistent dictionary (or other datastructures). As an interesting side-note we could use a related data-structure [Ctrie](http://lamp.epfl.ch/~prokopec/ctries-snapshot.pdf) to implement a concurrent lock-free dictionary. Ctrie also support `O(1)` snapshotting so we could replace the HAMT used here with a Ctrie.

We can excise libm, it seems to be done, not sure what’s holding it up:

vchuravy · September 5, 2023, 7:23pm

Note that we differentiate between Base and the standard library. Things that are in Base arguably for the bass of the language. The minimal common denominator and something like PersistentDict should only be see if it is necessary for a use-case in Base.
Here it is used as an implementation detail for a feature that requires runtime support.

Everything that is in one of he standard libraries is a candidate for excision from Base, but we must be careful to not do so in a breaking fashion or significantly regress the usability of Julia.

LinearAlgebra is hard since it pirates functionality in Base. Falling back to a slow implementation is disruptive to users. I think we can make all this happen, but it is will require more work than “just remove it from the sysimg”.

Any upgradeable standard library requires the user to add compat bounds when registering a package. We automatically do so retroactively in the registry, the registry bot will not let you register a new version without adding a compat bound.

We are intentionally slow with the process right now to give us time to discover issues. Statistics.jl is now done and Pkg is up next.

ShalokShalom · September 5, 2023, 7:44pm

I feel something like in Ocaml would be suitable here.
A multi tier system.

In this case, it would be the core library for the absolute essentials, and then a base_math library or something.

Generally, I see the trend to make the standard libraries smaller and smaller, since they can iterate faster outside of it.

Benny · September 5, 2023, 7:51pm

Oh I had meant LinearAlgebra being the separate package, and it would import whatever remains from Base. Am I misunderstanding what excising is?

Sukera · September 5, 2023, 8:20pm

That’s more or less what excision already is. The issue with excising LinearAlgebra in particular is that it does A LOT of type piracy, exactly to make arithmetic on Array work. LinearAlgebra itself owns neither * nor Array, so this is a bona-fide example of type piracy, if it were an external package. Last I heard, the only reason it’s not an issue in Base is because LinearAlgebra (currently) is always part of the system image, so there’s nothing being invalidated.

On top of this - there’s not a whole lot left that LinearAlgebra could just import. It’s mostly hooking up various methods of * and such to a BLAS, as well as implementing fallbacks for AbstractArray and some of its subtypes, as far as I know.

ufechner7 · September 6, 2023, 4:13am

Is operator overloading in Julia not possible without type piracy?

Sukera · September 6, 2023, 6:27am

It sure is possible, but not if you don’t own any of the types in question. For example, LinearAlgebra defines

function (/)(A::AbstractVecOrMat, B::AbstractVecOrMat)

but it owns neither / (the division operator) nor Base.AbstractVecOrMat (the union Union{AbstractArray{T, 1}, AbstractArray{T, 2}} where T). This kind of definition is really not possible without type piracy.

Topic		Replies	Views
Minimal Julia: What do you want in Julia, or not? General Usage	38	2425	February 27, 2025
Why hasn't LinearAlgebra been removed from the default sysimage? General Usage	56	1799	March 9, 2025
Future of sparse matrices in base / stdlib Internals & Design	33	1986	June 25, 2018
When should a package move to the standard library or system image? StaticArrays, what is it? Internals & Design package	37	4493	December 31, 2021
Splitting more things out of base Internals & Design	27	1551	May 17, 2018

On potentially expanding or pruning the standard library

Related topics