Forward compatibility and stability of Julia vs. Packages

I’m really confused about the Parnas’ Principles point. In particular, I don’t see how either of the two statements are even possible for open source software. One of the things we can all agree on (I assume) is that we love the fact that Julia is open source and we can inspect what the internal code is doing, if necessary. For Julia to satisfy Parnas’ Principles, wouldn’t it by definition have to be closed source, proprietary code? Going further, are Parnas’ Principles essentially arguments against the entire idea of “open source”? After all, that it is the only way to guarantee that users can “see the public API and nothing else”. Am I missing something?

I presume the suggestion was not to make internals of packages (or Base) literally inaccessible, but more highlighting the point that Julia does not have good support for expressing the intended public interfaces which are “guaranteed” not to break.

What a package is doing internally are internals. StatsBase didn’t advertise that it defined aliases as one of its features.
KernelDensity.jl never should have been using it. If it really wanted these aliases, it should have defined them itself.

4 Likes

Where I work, we pin many open source packages. It takes months of work on regressions each time we upgrade.
It’s probably every other week, or at least once a month, that we get test failures from a package we hadn’t pinned updating, from some combination of breakage in a patch version and one package relying on another package’s internals.

5 Likes

Still, you basically have to pin most packages if you have a lot of dependencies and want a stable experience.

That’s the claim about internals (non-API). Everyone else could have defined them yes, or StatsBase just kept them?! Why did StatsBase make them in the first place? And why drop them? Was there ever a real need to drop them? That’s my point. Then everyone could always just use the latest version, always upgrade without fear of breakage (except for real bugs). And not:

Sounds really bad, would it happen with Clojure?

Package authors need to be able to make mistakes with their internal design that they don’t have to stick to forever. We need to be able to delete code we don’t use so packages are easy to understand.

5 Likes

Not inaccessible, meaning source code not available (if you want to help with the development of the code on the other side of the API. But yes, Parnas’ point was that the user don’t need to know and thus shouldn’t have access (marked private). And vice versa.

I’m not a fan of ever making private members truly inaccessible. If devs (or users) want to play with knives, we should let them. I think the bigger problem is that there’s no way to put caution tape up saying “watch out, there are a bunch of knives here”

1 Like

Not according to Rick Hickey, to delete (or rename), is an imagined need. Ok, in private, but not after you publish, to respect open source users. You can have new code (and old code in same package, in some attic/“derecation” file).

Disk space is cheap enough that you can keep source code around forever. If you need to change, you change by adding. Then nothing is broken. If you really think you need to rename/delete, then we use an alias. For binary compiled code, e.g. the precompiled (fully in 1.9), you shouldn’t get two copies of it. If that’s the case, then I’m sure it can be optimized.

Thanks, I was thinking of that talk, but didn’t post it, you beat me to it. Linus has absolute “do not break userspace” rule, meaning you can’t ever drop an API. there is not internal API accessible to programs (only to drivers, and he doesn’t want to keep it stable internally; some consider that a mistake, he could also have source code non-breakage internally in the kernel, and arguably a stable ABI for driver, or not, that’s also an issue of not wanting non-GPL drivers).

I doubt Rick Hickey meant we cant have any internal objects or methods at all?

To get this clear, do you mean we can never delete even an internal underscore method?

To me, internal code refactors are central to development. If we cant refactor our code our ecosystem will be an incomprehensible mess.

(talking about disk space is missing the point, this is about cognitive overheads)

5 Likes

He is talking about public API.

7 Likes

Sure, when affected package authors update their packages to conform to changes in Julia, everything works again. This means “updating Julia” requires “updating (some) packages”, so the former cannot be easier/more comfortable than the latter.

Yes, that’s certainly true. Again, the user-facing effect is the same: he updates Julia, and old code (code = project + all deps) often stops working.
I agree, this is not a bad thing in itself — internal refactorings are crucial, as @Raf says. But I think this point should be clearly and unambiguously stated somewhere, so that users don’t get the wrong idea “They run PkgEval and are serious about that — so surely, code that’s in popular packages won’t stop working in newer Julia, right?”.

1 Like

It sounds like you have a lot of experience with real-world breakage. When you track down the cause of the breakages, what do you typically find? Is it packages relying on internals of their dependencies? Sudden failures of inference that packages happened to rely on for correctness?

My point here is that if, say, 90% of breakage is caused by people unwittingly using internals, then the issue of “we need a way to signify public API” is 90% of the explanation for breakage, and discussing anything else is really a side comment.

2 Likes

A.

I’m also opposed to preventing it, since it’s a braking change. But the access isn’t the problem, until in a later version where the that internal, the name, is (needlessly) changed.

I could think of static tools; or dynamic in Julia, similar to --depwarn, but let’s defer discussion on how to handle the change (which type of tool), until we agree the change is necessary. I think more education about breaking bad, even for internals, is needed.

Let’s take that as a concrete example:

No! Yes, they could have defined themselves, but it’s easier to use types from others and bad to copy-paste; StatsBase could have just not removed the types! There was no need to(?), they were still “convenient”:

-## convenient type alias
-#
-#  These types signficantly reduces the need of using
-#  type parameters in functions [..]
-#
-# These could be removed when the Base supports
-# covariant type notation, i.e. AbstractVector{<:Real}
[..]
-const RealVector{T<:Real} = AbstractArray{T,1}
[..]
-const RealFP = Union{Float32, Float64}
[..]

I still support this change (I think it’s not breaking if RealVector def from above would have been kept):

-function ecdf(X::RealVector; weights::AbstractVector{<:Real}=Weights(Float64[]))
+function ecdf(X::AbstractVector{<:Real}; weights::AbstractVector{<:Real}=Weights(Float64[]))

I.e. would this have been better: @deprecate const RealVector{T<:Real} = AbstractArray{T,1}?

const RealFP = Union{Float32, Float64}

is an interesting case, since its use replaced in the rest of the StatsBase with Union{Float32, Float64} meaning same behavior (except breakingly dropping the alias), but the change was likely wrong, meaning even better to then go to:

const RealFP = Union{Float16, Float32, Float64}

It gets annoying to type these all over, so why not keep RealFP with the new definition? Note, such an expanded definition is not a braking change (even if KernelDensity.jl had used it, but if it had used, would have been worse if it had copied the definition).

Are you sure he’s just talking about that? Because I don’t recall that from his talk, and at least I think it can apply to internals too. Note, his talk isn’t just about Clojure, it’s general, also on Java I believe. And Java doesn’t have this problem. It has “private”, so then you CAN change the internal in a non-breaking way. But Clojure doesn’t have private (and public etc.), and I’m still looking into how internals are handled there (it doesn’t claim to need encapsulation because for persistent data structures). If you’re not talking about its standard library, written in Clojure (mostly?), but the deep internals, then it’s of course built on Java, and Java on C++, both with “private”.

B.

It’s great that packages get fixed in later/est versions, but the guarantee should extend to all versions, so I assume in (all) cases this is about accessing internals. Which I think should also work, or could work.

That is potentially worse in Julia (than C or Python). Assuming Julia has undefined behaviour (it has, how much, worthy of a new thread), then you compile to native code, e.g. in precompiled packages in 1.9, but then in 1.10, the package will be reoptimized again (presumably) and can get broken. And this is even with people just using documented API of code of Julia and packages. In C you’re at least distributing binaries, and with Python, yes compiled C code, that could be the same across major versions (or not).

There’s maybe no good solution if we want ever more optimization. One solution is stop chasing that (maybe completely), maybe have some of the optimizations opt-in, e.g. through macros Like for LoopVectorization. Still you would just more the problem to there. Another option is precompiled code would work from 1.9, to 1.10 etc. (e.g. fully into JLLs). Would

I think that can mostly be explained by lower expectations.

Over the last year I have updated a number of Python projects with pinned dependencies from Python 3.8 to 3.10, and it’s rarely possible to do without upgrading the package versions and deal with whatever breakage that introduces.

The main reason for that has nothing to do with poking into internals though (*), but with binary dependencies. Many Python packages have parts of their implementations in a second language, e.g. C. Normally users don’t notice much of this because your typical pip install downloads a “wheel” where the C code has already been compiled for your architecture. However, these pre-built wheels are specific to a given Python version, and when you update to a new Python version, chances are that your old pinned version does not have a pre-built wheel to be downloaded for your new Python version. In that case it instead tries to build it locally, and in my experience that always fails for at least one package in every project. (Actually I have never seen it succeed at all but I only get alerted by failures, so there’s a large selection bias there.)

(*) Unless C extensions need to do so. That’s outside my Python knowledge.

4 Likes

I have a likely naive proposal, which is simply to have macro @internal that modifies the method doc entry, like this:

@internal"""
    f(x)

    Function that returns 1

    # Example
    ...
"""
f(x) = 1

such that ] ? f would print:

help> f
     >> INTERNAL structure or function: Interface may change!
        Type: ?? f for further help.

And then the doc entry is treated as an “extended help”.

Or simply @internal f(x) = 1 to just print that the function is internal without defining any documentation.
Maybe @internals begin ..... end for a bunch of internal functions. I’ve been using something like this, manually, with my packages.

With that, at least, one can clearly document that the function is internal, which is something not completely resolved, as it currently depends on finding or not documentation in the documentation of the package.

(I do see the boilerplate associated when one does not want to have a doc entry at all, though).

Right, but I had in mind a tool or switch to know about other code/packages, that uses those internals. All packages have internals, and it’s rather obvious what’s not exported, but not if some other package actually uses internals of others. Also it can be not in a package you use directly, but rather arbitrarily deep in the dependence hierarchy.

Julia has many small packages, say A that uses internals of B, then unsafe to use A, in case B gets updated, i.e. what is used gets changed (internals or the documented API), but maybe this would in some cases solved by macro-packages, C that depends on A and B etc.? I suppose upgrading B would still have the same problem, unless C depends on specific versions, and it could get tedious to specify and maintain.

And again, best to not delete/rename internals to avoid the issue. Don’t most that depend on internals know they’re doing it? What are some interesting cases of doing such? [In non-compiler related package, closely related to Julia.]

First, it is definitely much better to copy paste some type alias that you also find useful than to randomly grab them from some package that happen to define the same ones.

Secondly, this take makes trivial refactoring into maintainance burden. For example, factoring out a piece of code into an internal function suddenly mean you are stuck with the behavior of that piece of code all of a sudden.

Yes, he is talking about the case when you release a breaking version.

3 Likes

I like this idea to use the code as its own documentation (the “literate” vibe).

Another naive approach that I like, used in Python, is to prefix internal methods with an _. In Julia we already have a convention to indicate mutability, namely, to suffix a bang (!). Why not a “just” another a stylistic convention for internals then, with this _?

Maybe a cons would be the visual? But even then I wouldn’t bother.

1 Like

In my experience, there are many more internal functions that external ones. I wonder if the inverse of your strategy here makes sense.

By which I mean, by default, the help message warns that the function or type is internal. And only when the package owner commits to making a function external, do they mark it as such and the warning goes away.

This has the added benefit that you won’t accidentally miss marking something as internal and implicitly commit yourself to maintaining it forever.

3 Likes