A kind of "two language problem" in Julia Ecosystem

Maybe my wording was not the most fortunate then. By standard I was referring to “not doing macro stuff,” and the examples were included in the spirit of the topic: things that I wouldn’t be able to implement using the documentation.

In this context, saying that “it is not documented anywhere” is basically enforcing my initial post where I said that I perceive a gap between the code I see written in various packages and the code that would follow naturally from using the documentation alone.

The examples I presented are derived from the things I discovered in the source code of Cobweb.jl package. Take a look here.

If the package author is using “internals” knowledge that is not part of the public API, it seems to me that the author might be someone who is at least familiar with that codebase - if not actually contributing to the language (I appreciate that it is improbable to encounter the string property by accident alone).

I am not knowledgeable enough to derive any strong conclusions about the 'internals" knowledge usage in this scenario. For now, I am using all these as lessons for my Julia journey, and hopefully, somebody with extensive knowledge of the language and ecosystem will provide some guidelines that will help close the gap I am perceiving.

Please note that I am not disputing the things you say about not having a guarantee that this kind of code would still be allowed in future versions of Julia. Somebody with more knowledge can shine some light on the matter.

2 Likes

Isn’t that just multiple dispatch? If there was suddenly a carve out that prevented getproperty from being dispatched, that would certainly be a breaking change, no?

2 Likes

I was wrong, getproperty isn’t just for Symbols. It was implemented in Base for Tuples and Int, which isn’t surprising because the underlying getfield is actually the Core method, not getindex. It’s just a little annoying because x.1 is parsed as a float 0.1, so you have to do x. 1 or x.:1 (the :1 makes an Int, not a Symbol). I haven’t seen an equivalent for Strings yet but maybe it won’t go away anytime soon.

Parentheses being taken by broadcasting does limit how dot syntax can be parsed to a getproperty. But if you can @eval in the global scope and interpolate what you want, like @eval t.$((1,2)). That line in a function won’t run code in the function’s local scope, but you can @eval the overall function and interpolate the values or expressions in that. I do feel it’s only responsible to point out that interpolating a value rather than an expression means that value was evaled before, not along with, the outer expression, and it behaves like an instance accessed via a const global variable:

Interpolating an existing mutable value versus its expression
julia> @eval function f()
         inner = $(Ref(0)) # insert existing Ref(0)
         inner[] += 1
         inner[]
       end
f (generic function with 1 method)
julia> @eval function g()
         inner = $( :(Ref(0)) ) # expression makes new Ref(0)
         inner[] += 1
         inner[]
       end
g (generic function with 1 method)
julia> function g2()
         inner = Ref(0) # makes new Ref(0)
         inner[] += 1
         inner[]
       end
g2 (generic function with 1 method)
julia> f(), f(), f()
(1, 2, 3)
julia> g(), g(), g()
(1, 1, 1)
julia> g2(), g2(), g2()
(1, 1, 1)
3 Likes

Relatedly, which part of Base is covered by Julia’s backward compatibility guarantee within the same major version? Documented methods or exported methods?

1 Like

This ironically is where Julia having public/private would actually help – presumably the answer is all “public” and no “private”, but export is not quite the same concept.

2 Likes

I think your response reveals even more black magic, which enhances the OP’s conjecture that we actually have another “two language problem”.

1 Like

Frequently Asked Questions · The Julia Language

4 Likes

When I read base or package code, what I often find is very terse code.

Short circuit conditionals are used a lot; here is an example from the documentation

`function fact(n::Int)

       n >= 0 || error("n must be non-negative")

       n == 0 && return 1

       n * fact(n-1)

   end`

Also, functions usually don’t have explicit return statements.

Then there are the calls to mystery macros

@mystery

Few comments (apparently because the code is obvious).

1 Like

I don’t know the entire story, but I look for a couple things for stability.

  1. The major version being a 0 e.g. v0.22.5 indicates that the package can undergo breaking changes as the authors figure out the features. Don’t take this to mean that v0 packages are unreliable and undocumented, it’s more often the opposite. For a non-Julia example, Numba has been around for 10 years and is often one of the first examples of a Python accelerator in conversation: it is now v0.57. Don’t assume that minor revisions e.g. v0.22.5 to v0.23.0 have more widespread or chaotic changes. It’s just that any part can change, so you should rely on environments a bit more than usual.
  2. When the major version is 1 or more, minor revisions promise backwards compatibility of some features (see 3.), which is great because you don’t have to fear losing them when you upgrade for new features.
  3. Features, types, and function signatures in the package’s documentation (not docstrings!) that are not marked experimental. Anything marked experimental or not mentioned in the documentation at all are not promised to be stable across minor revisions. You can still use experimental or internal features, but you must accept the risk that you must replace that feature when a minor revision removes it. You don’t want to spend too much time doing that instead of developing your own code, which is why using stable features of v1+ dependencies is great.

This divide between stable and internal isn’t really a problem, more a fact of life. If all of your code is deemed stable, it’s pretty stuck there. Internal/undocumented features are pretty cool though, and I also sometimes wish there were more comments and docstrings that explain them. But I don’t expect authors to spend their time explaining to me something they didn’t intend for me to rely on; that effort is better spent on improving the documented features and adding new ones.

6 Likes

@Benny, thanks for the extensive overview.

Before starting, let me present the lens to be used to read the text below: the desire/need for Julia OSS ecosystem growth. This is the main assumption - without taking this into account, some of my comments might sound rude or against the freedom of developer expression.

Now having set the playground, here is my take on the liberty of knowledgeable developers in using internals in packages: it is disrespectful to the other developers willing to contribute, and I don’t think that having a version less than 1 can be a reasonable excuse.

The only excuse is not being able to implement a certain feature without the usage of non-public internals (which should not even be possible in a Turing complete language). Also, syntactic sugar should not rely on the internals (there is no excuse for that - given that we have these god-level macros).

I think the usage of internals in public OSS packages is hurting the Julia OSS ecosystem by promoting the intimidation factor and driving away the developers who want to help build the Julia OSS ecosystem (and maybe this can be a topic on its own).

Now, let me go into more detail.

Imagine that a fraction of new Julia developers who are starting to be confident enough are willing to contribute to the Julia OSS ecosystem. This can be done by building a new package or contributing to existing packages. Many might be interested in contributing to some package they already use (maybe something that their work relies on).

They are starting to look at the source code and encountering some features built using internals.

I don’t think it is fair to require the new developers that are willing to contribute to actually go and dig up the Julia internals so they can effectively participate. Also - not knowing that the code they see is actually non-stable Julia features, some might feel that they are not experienced enough to start contributing and just give up the idea altogether.

Yes, the users of the package were given a fair warning by means of <1 version. But in this scenario, if the internals is used, it seems that this is also a warning for the potential contributors - if you are not skilled enough in the language internals (or willing to spend the additional time acquire the relevant knowledge), do not approach this package - go away and come back when the package reaches version 1 or more.

I had a small contribution to the OpenAI.jl package - however, I am not sure if my contribution would even have been possible if the source code of the package looked like a foreign language to me. Fortunately, it didn’t.

It was not my intention to hurt anybody’s feelings here. I appreciate the skills of those using internals, and I am learning a lot by reading their code. But again, the focus of my message is not related to the OSS ecosystem, and it is my honest opinion that the use of internals in public packages is potentially hurting the community.

P. S. Maybe next year of Julia Community Survey, a new question can be added: Did you want to contribute to a Julia package but gave up because you had encountered some code that seemed foreign to you as a Julia developer?

2 Likes

I’m not sure I understand completely. Are you saying that too many packages are using internals of the Julia base language? I agree that this is not good practise, but I was not aware that was so common.

3 Likes

Given the discussion above, I think what @algunion means by “internals” is really more “Julia features that exist in a stable and semi-documented way but may not be easily discoverable by newcomers, for instance because they require extrapolation beyond what’s in the documentation”.

5 Likes

This would make more sense. Otherwise, internals are in any case not protected by the semver, and thus can change without deprecation warning or any notice. Which is a good enough reason to me to avoid using them in any registered package.

1 Like

I think there is some confusion between theoretical and applied computer science here. There are some things that are impossible to implement without internals. I think Revise.jl is a classic example, it does need to plug some things in the REPL to be able to keep checking for changes in a file and updating changed functions. This has nothing to do with Turing-completeness because Turing-completeness only cares about computable functions, this is, series of purely mathematical transformations that have nothing to do with IO or compatibility with already existent pieces of code. You can write any “data transformation” in Julia given enough code, you cannot manipulate something inside an external library that does not have a public interface method allowing the manipulation if you do not rely on internals.

3 Likes

Well, I found that discussion a bit confusing, and it seemed like it wasn’t really about undocumented (or semi-documented) features, but about innovative uses of the features. Things like using getproperty in creative ways.

I must have missed the pertinent examples.

1 Like

That’s another way of putting it. My feeling is that the core question is the gap between the “naive” Julia that beginners tend to write, and the more “advanced” Julia that experienced users produce instead, where they use certain tricks for

  • performance
  • generality
  • reducing LOCs

Of course documenting all of these tricks would be impossible, because many of them are just nice side effects of Julia’s generality (eg. traits).
But a good heuristic for package developers might be to ask: if a beginner were to read my code, where would they stumble syntax-wise? Perhaps the answer is just as simple as adding an inline comment or a slightly longer docstring in these precise locations.

7 Likes

I can’t speak for what @algunion means, but looking at Cobweb.jl, I don’t see anything that is actually “internal.” Base.getproperty() is documented and so is multiple dispatch. It is dispatching on a type owned by the package, so no piracy.

That said, it is tricky. Which is very cool! But it also doesn’t have a docstring or comment explaining how/why this works which could definitely make even a more experienced Julia developer scratch their head.

I don’t have any specific examples off-hand, but this is not the only place where I’ve come across package code that is inscrutable to me as a newcomer. I definitely support the idea that package maintainers should strive for very low barriers of entry to understanding and contributing to our packages.

My personal philosophy with ExpandNestedData.jl has been feature > documentation > thorough comments/docstrings. But I can see the case for switching the last two, if a user wants to understand a feature they’re using, they should be able to look at the source code and really understand it.

1 Like

In my opinion, the problem is as @gdalle nailed it. There is a huge gap between naive use of julia and advanced one. But I think this is in every language.

In my case, the best way to learn was to read other people code and follow this forum. I do not see nothing bad on using internal function of other packages, if the author is willing to update his code if that function changes.

5 Likes

@DNF, I think reading my message as a reply to what @Benny wrote is essential.

I will not reiterate here everything he said - but he talked about packages from the stability point of view. I am 100% with what he said - if I am to evaluate that from the point of view of the usage of those packages.

My answer was designed to underline that there is another angle: that of potential new developers who might want to start contributing.

2 Likes

getproperty(value, name::Symbol) is documented for dot syntax, but not name::String or even name::Int which is implemented for value::Tuple. I would argue those function signatures for dot syntax are internal, not public. To be clear, we shouldn’t confuse undocumented internals with unfamiliar documented concepts, like of functions as instances mentioned earlier.

I don’t think this is a reasonable expectation because Henrique_Becker is right that tapping into internals is necessary sometimes. Sure, people should strive for features guaranteed across minor revisions, but I agree with Tomas_Pevny that there’s nothing wrong with maintaining your own internals along with another package’s internals.

This is the understandable frustration. Ideally people are commenting on internals usage so anybody can look at it and think “oh this is what’s happening”, and that does happen sometimes. But standing in their shoes, it is a tough ask to do that so thoroughly when the feature may be gone in a couple minor revisions and I didn’t intend for anybody except other developers to work on it. I’d think that if there was a problem with the code that someone else wanted to get involved, they should directly communicate with me (open issues, forums, email), which will be far more informative than any docstring or comment. One case where that may not be feasible is if I didn’t have time to maintain or develop the package, in which case it would be nice if I wrote down enough that someone who wanted to take the reins could do it without me, but the caveat is I probably wouldn’t have the time to do that either.

1 Like