Does Julia Create a "1.5" Language Problem?

It’s the incrementally compiled package, which contains the binary codes and runtime metadata. I use the word “module” because that’s how it works under the hood and it’s the runtime representation of Julia’s package. It’s also related to issue of separate compilation.

I wonder whether we can actually do this, because the build environment may be different to user’s environment. For example, if package X’s method table depends on whether another package Y is installed, this will create a inconsistency. Many simple packages only define their own method tables or overwrite methods in a good way, so this may not be a problem. But it’s hard to say. This is one of my biggest concern regarding to separate compilation .

1 Like

At best the composition is covered in extensions, otherwise this isn’t covered by precompilation, unless you make your own package or PackageCompiler executable.

It would be extremely helpful to have a set of benchmarks for dynamic dispatch in Julia vs other languages (static and dynamically typed). Then there might be more incentive to find what the bottlenecks are and improve overall performance. Presently, I agree that the gulf between code using static dispatch (generally type stable) and dynamic dispatch (often not) is too large, which means fluctuations in inference quality can cause dramatic and unpredictable performance swings compared to going from static ↔ dynamic dispatch in other languages.

5 Likes

Are there examples showing this? My understanding is that OCaml’s GC has had a lot more work put into it, whereas historically Julia has leaned on in-place mutation + stack allocated immutable types. Reading through sources like Understanding the Garbage Collector - Real World OCaml, I notice the OCaml GC also gets to make use of tricks such as generational copying which the Julia GC can’t. That said, given the amount of active work being done to improve the Julia GC, this question might be worth its own thread :slight_smile:

I’m trying to think about what you mean by this. What advantages does a vtable lookup and devirtualization have over multiple dispatch? I’m assuming that you do not actually mean needing the usual syntax of object-oriented languages or the semnatics of inheritance, but rather the relationship of methods to classes in object-oriented languages.

Specifically, in C++ when you close the class all the applicable virtual methods have been declared. At worse, calling a virtual method involves an extra step of deferencing a pointer.

The advantage here is that dispatch is constrained to a known set of methods. In contrast with Julia the method table can always be extended. The missing feature in Julia is the ability to restrict the method table from being extended for at least some functions.

Would having private functions that cannot be extended outside of a module address this in part? When a module closes, the language now knows the private functions associated with types declared in that module. It could then associate the private functions with the type via a vtable.

For example, a package such as StaticArraysCore.jl could implement a private StaticArraysCore.length. StaticArraysCore.length is then associated with all concrete types declared in that package. StaticArrays.jl would then just forward the public Base.length to the private StaticArraysCore.length.

I think we could build package images for users rather than having the user precompile them. One disadvantage of this as with all pre-built binaries is that the binary may not take full advantage of the user’s processor’s capabilities. Package images already do support multiversioning by setting JULIA_CPU_TARGET similar to how we support multiversioning of the system image. Another challenge is relocability, but this was recently addressed in pull request 49866, coming to Julia 1.11.

I believe the package X and Y issue is addressed by package extensions unless I misunderstand the issue.

Just before Julia 1.9 came out, I was working on putting Julia packages in conda-forge. After I saw that Julia 1.9 would include pkgimages, I decided to pause to understand how pkgimages worked. One possibility is that conda-forge could precompile the pkgimages using a JULIA_CPU_TARGET similar to the standard system image and distribute the binaries as it does for C libraries.

If there were some Hub for Julia that was running cloud infrastructure and package servers, cough, maybe they could also provide some kind of binary package service?

I don’t know where you get this impression from (" missing feature is … restrict the method table…"). Can you elaborate how is this feature related to the poor performance of dynamic dispatch?

In other words, suppose that all the method tables in Base are closed, will untyped Julia become as fast as untyped Python?

As far as I understand, you want to be able to implement cheap virtual functions using a virtual method table. I’m assuming this means that we could convert dynamic dispatch into a hash table lookup.

We can already despecialize arguments in Julia via @nospecialize. The other advantage that OOP has is a constrained set of known methods. If we had a mechanism to constrain the methods in Julia, I’m unclear what other advantages OOP would have. You have me thinking about Java rather than Python for comparison here.

What I’m trying to do is to understand your comment below. Could you elaborate further?

Perhaps I’m starting to understand. The aspect of virtual functions you want in Julia is the ability to define an abstract function signature and return type but not in the way that FunctionWrappers.jl implemented it. The FunctionWrappers.jl approach involves unncessary overhead and creates a barrier that prevents compiler optimizations such as constant propagation. Is that what you mean?

Personal experience: in Julia, if a language feature is annotated by @ instead of keyword, then it means Julia core developers still don’t make their final decisions on this feature (maybe because its semantics is unsound). Even old macros like @fast_math and @inbounds can be removed (see recent discussions on github).

This is only part of the problem.

Firstly, no matter how hard you work on improving performance of dynamic dispatch, you still need to introduce virtual functions as alternative of FunctionWrapper.jl, why bother with patching current implementation of dynamic dispatch? This is a more controllable way to structure people’s code.

Secondly, the bigger problem I want to emphasize here is that, performace of dynamic Julia is too bad:

  1. Type inference is too slow and annoys everyone.
  2. Even if you use compilation cache, you still need to compile at least once. Currently Julia’s installation time becomes much longer. Since people need to upgrade their packages regularly, it’s no hard to see that this will become a new problem (TTFI : Time To First Installaton).
  3. Dynamic dispatch is slow compared to Python. I don’t know why. I think simple hash table look up shouldn’t be this slow. Currently Julia already applies a lot of optimization to dynamic dispatch, like method instance cache and call stack address hashing.

I want to take about point 3. Many Julia programs are not written with pure interpretation in mind. They do a lot of specialization and generate many wrapper functions, which hurts interpretation performance greatly. You also need to calculate hash for input type and perform table lookup. In contrast, OOP language has speculation JIT, so most of the time the expensive property lookup is avoided. Anyway, currently Julia’s pure interpreter mode is just slow (maybe someone should look into it and improve it).

Maybe we can do this for Julia – a powerful multiple-layer JIT that matches and outperforms Python interpreter’s performance. It can hide compilation latency by firstly running interpreter mode then compiling the function in the background.

But besides the technical difficulties, why bother with this approach? Yeah, it sounds really fancy, however no one can say for sure that this approach must work because traditional JIT techiques don’t quite apply here. It still took Python’s developers several years to implement its JIT. And in Julia people are greedy, there are a lot of unfinished features and yet so limited supply of developers (kindly remind : we have multithreading, debugger, improved type inferencer, effect, cloud precompilation, XXX). That would require multiple years.

It also has many dramatic effects on other language designs. For example, code analyzers can’t handle dynamic codes well. If dynamic codes spread everywhere in every library, then it exponentially decreases usefulness of the checker. Even in Python people nowadays write more types instead of less, so linter can give more suggestions.

And you get what? Make dynamic Julia codes run faster? Is that an important problem? Maybe. But in Julia, we have many weird features, even if you find solutions for this problems, then you lose chances for other problems and introduce new problems.

If I’m not mistaken, Julia’s ‘just-in time’ method of compilation would prevent it from being used for low-level interaction - C and Rust are compiled and can be used to run an x86 processor, an ARM processor or whatever CPU is in the chosen microcontroller - as far as I know, no high-level language is currently able to do that.

You mental model is wrong. Maybe you use a different set of terminologies. But even Javascript nowadays is complied into machine code, so you can even run Javascript on a microcontroller as long as the microcontroller has enough memory. The main problem is that microcontroller generally has limited memory so you can’t have complicated language runtime, like garbage collection.

2 Likes

Interesting.

I’m actually just trying to understand your proposal. At the moment the virtual function feature sounds like it would require a static compiler.

Let’s say I write an implementation of a virtual method. When does Julia compile that? When does it report if my implementation does not return the indicated return type?

Perhaps we are going to rehash an old discussion, but perhaps something has changed?

https://groups.google.com/g/julia-dev/c/N9mj_eI9wCE/m/78AwQ71YAAAJ

In which practical situations this manifest and is important? (sincere question). Looping over collections with multiple types?

2 Likes

…and he is the real prophet :rofl: :rofl: :rofl:. So eventually facts win over opinions.

Many things changed since that post. For example, the “clever” union splitting is only added in Julia 1.0 (2018), which I have said many times in this thread this is the worst PL feature imo.

That’s why it needs to stay in the static superset of the Julia (with static compilation). The reason why I introduce it is to ignore all these Julia-specific issues, so I can focus something more important.

A simple logic :if you don’t need it, then your codes are type stable and can be statically typed. Now a lot of people tell you that they want to focus on prototyping and they don’t need static typing, but they also don’t want to suffer from compilation latency and static typing.

What does this imply? You have to speed up dynamic dispatch so at least it’s as smooth as Python.

Me neither. Python is fairly simple, for a.f(args...) you check type(a) and search a dictionary for the name f, and then you call the function there on args.... The most I’ve read for Julia is here: Julia Functions · The Julia Language. There’s not much detail. While at first glance it seems to mirror Python’s process, each step can obviously get much more complicated; searching a name f is far simpler than matching a call signature’s tuple type to methods whose annotated types are often supertypes, not exact matches.

What implementation are you talking about here? The reference implementation CPython has no JIT.

I don’t think anyone was suggesting such extremes. Dynamic typing doesn’t imply doing all type-checking and dispatches at runtime, just as static typing doesn’t imply an absence of runtime dispatch.

There have been discussions comparing dynamic dispatch vs. C++ in applications where that is important for performance (a ray tracer). But where do you feel it non-smooth compared to Python? Are there applications in which the performance of dynamic dispatch of completely type-unstable code matters?

Or you mean mostly the indirect effects of that on TTFXs?

I made a dump mistake here. I want to say inline caching…

Yeah. Loop is also a problem but I think it’s more or less solvable. These indirect effects are always overlooked. Binary cache and method tracing only solves part of the problem. Dynamic dispatch also makes static compilation unnecessarily hard. Many traditional numerical codes should be well-typed and easy to compile, yet it becomes a big problem in Julia. Now when you want to produce small binaries for your codes, you begin to regret your early decision.

Maybe now you start to understand why static compilation is not just calling LLVM to save binary codes for well-typed codes. Language features impact each other and have unexcepted consequences. Maybe one day you want to debug your codes or collect some profile data written in the static subset, and only to found that you are not allowed to call dynamic codes so debugging becomes really hard…

1 Like