On potentially expanding or pruning the standard library

This piracy issue is one of the proposed reasons to give LinearAlgebra its own array type.

1 Like

I always found the piracy problem confusing until I put together one simple fact. If I own at least one type, I’m adding a new dispatch to the method table. If I don’t, I’m overwriting an existing method which has a cascading effect where any method that calls the method I invalidated is also invalidated so large switches of precompiled code may need to be compiled again. That’s especially bad for a function as common as divide.

In addition to this, your new method could implement a function call that is expected to not be implemented (possibly throws an error to be handled).
I suppose you can sum up type piracy’s problem as changing a function’s dispatch behavior and thus the existing function calls in code you don’t own. Writing a method signature with at least 1 type you own (including that of the function) doesn’t have this problem because it can’t possibly exist in code you don’t own.
On a larger level, we want modules to work together and extend each other, but not change their original separate behavior because such change is unreliable and chaotic over the numerous permutations of packages. The cost of easy composability is a responsibility to make changes where they belong.
Some circumstances allow a derivative type-pirating package to not affect the original package’s behavior, though it’s difficult to prove and to update the derivative package in close concert with the original. One circumstance is the original package didn’t implement a function for some types and doesn’t demand the non-implementation for correct function call behavior, so the derivative package can safely fill the implementation gaps. It’s often simpler to distribute a forked package or contribute to the original package, so this isn’t common.

2 Likes

I understand that there are rasons to excise modules from the standard library, but personally but I cannot help feeling a little uneasy about the idea of removing functionality from Linear algebra or Pkg.
A few years back when I tried Julia for the first time, my immediate thought was: “wow that’s cool there are numpy arrays built into the language and I can just start multiplying matrices without importing something”.
Personally I would dislike it if I had to wrap arrays in some LinearAlgebra struct just to do basic matrix multiplication. Similarly Pkg is such an essential feature of Julia that it just feels unnecessary to have to install it first (although it’s probably not much to ask for to just put it in a global environment).

Another thing is that precompilation is an issue, at least in 1.9. In theory you should only have to precompile once per environment, but in practice I have to wait for precompilation loads of times.

I hope, if these packages really do get purged it will be at a point in time when satisfactory solutions are available.

4 Likes

note that for pkg at least, the proposal is to still have it in the repl. it will just no longer be part of the sysimage

2 Likes

Dispelling FUD here and making things a bit more concrete.

That would be a breaking change. That cannot happen. It’s internals reorganization so that its possible to build Julia without BLAS more easily for lean builds that’s on the table.

Note that “not being in the standard library” also doesn’t mean “not shipping with Julia”. Changing something so it’s not shipped with Julia is also a breaking change, so that won’t happen. For example, GitHub - JuliaSparse/SparseArrays.jl: SparseArrays.jl is a Julia stdlib ships with Julia because it was once a standard library, but now it’s a separate library that is maintained in a separate repo but for backwards compatibility it ships with Julia. This makes it a lot easier to build without SuiteSparse. Similarly it would be nice to have a basic mechanism for basic * and \ operations in base Julia but then excise things requiring BLAS outside of Julia, so that Julia can easily build without BLAS. I am very sure the standard Julia builds will still include a BLAS though, the point is to allow custom builds without it.

That doesn’t even make sense if you think about it. If you need Pkg to install Pkg then :person_shrugging:. No, what is being proposed is what Oscar mentioned which is for it to not be a part of the system image. The Julia installation process in a nutshell is:

  1. Builds Julia
  2. Builds a system image with the standard libraries
  3. Downloads extra included libraries

and ships. What is being proposed is moving some more libraries from 2 → 3 because anything in 3 is trivial to build Julia without, while anything in 2 is a bit more baked in. The big difference of 2->3 is that you’re then relying on the v1.9/v1.10 precompilation binaries tooling to handle the latency reduction of it not being in the system image, which is what is currently being tested for Pkg.

Also, the standard system image is included by default with all binary builds. It’s kind of necessary because that’s the standard library so most code doesn’t work without it. But it means that as a consequence, having everything in (2) is the reason for PackageCompiler binaries being so big. But why is it big? Well because whether or not you want it, you always get a build with a complete BLAS (from LinearAlgebra) and a package manager (from Pkg). Note that BLAS is a pretty huge part of the standard system image’s size and build time (the majority IIRC). If these were moved from 2->3, the standard Julia code would not be effected, but those pieces would be omitted from the images built into binaries unless the user code does using LinearAlgebra, Pkg. There are also plans for things like tree shaking to remove parts from the system image that are unused, but in general not including things is a billion times easier than removing things, and so this proposal makes the process of generating “good” binaries much simpler.

So to reiterate, what is being discussed has no usage effect beyond differences in latency. That would be breaking. It would be like SparseArrays, where almost nobody even realizes it was “removed from the standard library” about a year ago because there was 0 change to user code from doing so. What that means is that it’s no longer in the standard system image, reducing the startup time for people who don’t use it (since the system image size is dependent on all of the things compiled into there), and it is now easy to do a GPL-free build of Julia simply by not including SparseArrays in the build process. What is being proposed is to make it easy to do BLAS-free builds and builds of Julia binaries without Pkg. This proposal wouldn’t change how users interact with code from the REPL (if the precompilation is good enough to not cause a latency effect), but is instead so that it’s easy to build system images without the biggest and heaviest parts of the process. This would mean that a webserver can use PackageCompiler and not have BLAS or Pkg show up in the image if they don’t do using LinearAlgebra, Pkg, which would likely decrease binary sizes by much more than 50% and decrease the binary building time.

So then if this doesn’t have a user effect and it’s just a detail of the build process, why are we caring about piracy?

So final point, why do people care about the piracy here? Type piracy both (a) can easily invalidate code since it causes method redefinitions on common types and more importantly (b) can cause subtle errors. What I mean by (b) is that if a code only works because of piracy, for example if LinearAlgebra is required for \(A::AbstractMatrix, b::AbstractVector) to be defined, then it’s much harder to know if you’re allowed to remove it without breaking a user code. In a normal case, either the function is owned by the package or one of its arguments are. This means you can check, is the operator LinearAlgebra.\? Is A a LinearAlgebra.Diagonal? If a package does not do any piracy, then if none of the package’s types and functions are found in your code you can safely remove using Package and know that it will have 0 effect on the way your code runs. If a package does type piracy that is no longer true, and the example above, whether LinearAlgebra exists in your image will change how \(A::AbstractMatrix, b::AbstractVector) acts and so a code which makes no reference to LinearAlgebra can have behavior changed by its existence or non-existence. This means that doing a custom build without LinearAlgebra can change the behavior of user code and you have to know subtle details about how LinearAlgebra is made in order to know what behavior is changed.

The simple way to solve this issue is to simply not allow builds without LinearAlgebra so that way the behavior is predictable for everyone. And that’s what we do today. But then this is why when the topic of building smaller binaries comes up, well the biggest contributor is LinearAlgebra because it builds with OpenBLAS, and now immediately the topic is "how can we guarantee that if we do this in a way where if the user doesn’t do using LinearAlgebra they can safely be guaranteed their code work. Because something even as simple as:

julia> rand(4,4)\rand(4)
4-element Vector{Float64}:
  0.20433365580623522
  1.5079632049536102
 -1.1610567786430308
  0.03941594088643603

is using LinearAlgebra

julia> @which rand(4,4)\rand(4)
\(A::AbstractMatrix, B::AbstractVecOrMat)
     @ LinearAlgebra C:\Users\accou\.julia\juliaup\julia-1.9.1+0.x64.w64.mingw32\share\julia\stdlib\v1.9\LinearAlgebra\src\generic.jl:1101

so if you build a binary without it because the user didn’t do using LinearAlgebra, they might have a code that worked on their system in their REPL no longer work in the binary build. And again, if you don’t have piracy then you can guarantee that won’t happen because they will have had to do using LinearAlgebra to get a LinearAlgebra type or function.

13 Likes

AFAIK Julia is distributed under the MIT license.
Also, when I check out the SparseArrays.jl package, its license just refers to the Julia license.
Could you please clarify?

The code in SparseArrays.jl is itself MIT licensed. But it calls into SuiteSparse for the sparse linear solves and such. It’s SuiteSparse whose licensing includes GPL components.

More details are at the third party licensing doc.

3 Likes

Thanks for the clarification.

1 Like