Does Julia Create a "1.5" Language Problem?

It comes out of a function inside a function, not from outside. But my testing of the lines above was in the REPL. Regardless, the function was slow before taking the indexing out of the comprehension. It may be a beginner error, but one that perhaps could be detected/optimized when Julia compiles it.

1 Like

Not too surprising considering that methods on types, especially ones for reflection, are intentionally type unstable; this is one of the cases where you don’t need or want to recompile the code for every input type. Code leveraging runtime dispatch isn’t often expected for future small binaries, at least not without hefty restrictions.

Worth pointing out that in some cases (likely not the fixes mentioned here), these decisions are intentional. Matrix powers and eigen (issues #35667, #41243) on real matrices return real or complex matrices (2 types) after old developments (#12304, #15573, #21184, #27212). It was not accidental and is small enough to be handled by union-splitting sometimes, though there are legitimate advantages to refactoring to type-stable methods. In other cases like DataFrames, the type instability has compilation performance advantages, though opting into more restrictions with other libraries can make type stability worthwhile. Dynamic languages exist for contexts where easily deferring work to runtime makes more sense even for performance, though like any language, fighting the tradeoffs is a pain.

A worse level of myth I’ve seen than “walks like Python runs like C” is the notion that Julia can replace every language, and I would again attribute this to clickbait and misinformed comments taking the two-language problem out of context. There are many useful paradigms and features, and Julia was never promised to do all of them better and circumvent the tradeoffs. Although the efforts to expand Julia’s capability is worthwhile, it is not hard to find a language that will always do a particular thing better.

3 Likes

… and last year all these dispatches were replaced by manually union splitting or annotated by @nonspecialize, using JET.jl. JET.jl’s author did this work at around the same time. And reflection is definitely not intentionally type unstable. You can isolate these codes into type stable inference or make them type stable by directly manipulating jl_value_t*.

Type inferencer is different. It can be written in C++/C (and produce small binaries) and if so, we won’t have these complicated bootstrap issues…

Firstly, I know these decisions are delibratedly made, but it doesn’t mean they are correct. All the cases I have seen cannot be handled by union splitting. They are not even pathological cases, just some simple linear algebra computation. In all the packages I have tested, simply no one uses eigen correctly and leads to 5x compilation slowdown (and still gets Any). What’s worse, fixing them is not trivial. I need to investigate these linear algebra programs to see whether the input is symmetry.

I don’t want to hear any claim like “small enough to be handled by union-splitting sometimes”… Please don’t rely on union splitting. This is the worst nightmare for every code analyzer designer. If you do that, no PL expert can save you. I spent a whole year on formalizing union splitting. I went throught many program analysis papers and I got nothing. Many PL papers on this topic either use algorithm with high time complexity (like SMT) or the output is unreadable. I also look into other languages like Kotlin/Typescript and find that these languages only have simple CFA, so they are not bothered by these problems.

And this is just one of the small problem in Julia type inference. We then have effect, closure, type calculation… None of them is trivial. So I gave up. I may need to wait for other 5 years to get a usable product. But if I have 5 years I would rather get a CS Phd degree nstead of wasting time on this problem. Let other people solve them.

If people insist on relying on these fragile language features, that’s fine. The price you pay is losing small binaries and good code analyzer, and making up a lot of artificial problems.

7 Likes

Also, for static compilation. Instead of creating subset of Julia, we need a superset of Julia and let it compiles to Julia.

Why? Because no one will use your subset language if it’s restrictive. You have to replace those dynamic constructions with static counterparts so original expressive programs can be re-expressed in this new language. For example, typescript is a supertype of javascript and it supports many fancy type level manipulation. Restricting javascript only gives you a much worse and hard to use subset. Even if typescript does a excellent job, many people still refuses to use it.

For example, I will consider adding cheap virtual functions and OOP to this superset (by translating them to struct with an additional method table field). Having them makes type stable dynamic dispatch easier, also compiler can do devirtualization to improve performace. In contrast, FunctionWrapper is unstable and hart to optimize. Though this will make interaction more complicated.

2 Likes

I disagree with this. A subset is the most natural the way to go.

You can write your dynamic Julia as always, like many people do when they’re prototyping and navigating ideas. Once you’re done prototyping and you want maximum speed, you can rewrite more statically and with the added bonus of optional static compilation

1 Like

It never works. This is proven to be the dead way. You must do this from scratch in a richer static language.

  1. People seldomly rewrite their codes, see the Google’s paper I quote above. Because it incurs context switching and it’s costly to refactor codes, especially if your type definition is not well-typed. This is the innate limitation of human’s physiology and psychology.
  2. Restrictive language is unwelcomed and no one wants to use them. What’s worse, Julia’s type system is already weak enough, where there’s no cheap way to perform dynamic dispatch. Only typescript and flow get the correct idea and that’s why they become popular (I want to emphasize that the emergence of typescript is an revolutionary event in history of PL).
  3. Interaction of dynamic and static subset will eventually introduce new semantics (and create a new language). For example, is user allowed to extend method tables defined in the static subset? If only some of them are allowed, how should we mark them? Instead of using a lot of macros, I would rather use keyword to mark them.
3 Likes

Relax. You are going too further. If your arguments were true, all SciML projects are doomed?
Using Julia, one of my computational project reaches the same speed of its previous C++ implementation with just a little extra effort after the prototyping stage.

6 Likes

My feeling is that the most important application of static compilation is that of using Julia to create libraries to be called from other lanfuages, as with C, C++, and Fortrane. For that we need a fully inferred code with support for allocations and GC. What you propose seems to be more general, but perhaps not what most (?) of us would like to see.

3 Likes

You seemingly have different criteria but I would count @nospecialize annotations, including those in reflection.jl, as introducing runtime type work (not too badly, inference still does its best for callers despite runtime type checks, @nospecializeinfer is the propagating inference limiter).

Iteration and many base functions rely on union-splitting to handle nothing returns, which is less of a problem because of how much of a dead-end nothing inputs are. I agree that whenever possible, each input type should result in a particular output type, my point was it doesn’t immediately devolve into ::Any inferences and can be managed sometimes.

Well sure, I don’t think anyone was expecting a small binary to swap arbitrary column types in a large mutable table. Doesn’t seem very open-minded to call DataFrames fragile.

If you replace type-instabilities and the existing runtime dispatch with restricted substitutes, then you don’t have a superset of Julia, but a superset of a subset of Julia.

That is a neat way to do runtime dispatch, despite how much multiple dispatch enthusiasts criticize the limitations, but I’m not as hopeful about composing such OOP classes with multiple dispatch. From the few attempts I’ve seen, it also appears to be a “dead way”. Lisp has OOP and multiple dispatch, and the classes don’t encapsulate singly-dispatched methods.

On the other hand, I’m not as hopeless about program analysis. Python is more dynamic than Julia, but after slapping on a parallel informal type system, static analysis at scale is feasible e.g. Pyre for Instagram. Sure, it’ll never be as simple and guaranteed as in a statically typed language, but that doesn’t mean we’re stuck where we are.

5 Likes

I don’t know whether all SciML projects are doomed, because I am not their users… But at least I can say that my arguemnts imply that developing a checker/compiler to compile them will be doomed. @Benny also said type instability is helpful trick to speed up compilation and dynamic dispatch is not considered for small binaries, what do these two arguments imply?

SciML is a different question. Currently this is the only case I would strongly recommend Julia to other people, when you want to do a lot of code generation. And the success of SciML/Dataframe exactly proves my point : it’s nearly impossible to have any useful type checker for current Julia because many popular Julia projects use a lot of code generation.

It’s generally impossible to design a static checker for code generation. This is a well studied PL problem. So it puts all static languges into disadvantages … ML falls into this category. So even if a lot of ML codes are written in Python, it doesn’t create a lot of typing problem like in those tranditional bussiness-oriented PL (Java).

Julia is really good at this, partially because a lot of compiler magics are hacked into the type inferencer and the type system (like the small union splitting and other control flow sensitive rules). Now you pay the price for these features - they are inherently hard to type check and compile

  1. How do you know your codes are fully inferred? A checker?
  2. A fully inferred code may still call other uninferred codes, maybe written in another library. How would you handle that?
    Esstentially, this is a non-scalable hack to serve only a few current Julia users, because I can say confidently many of the Julia packages on Github don’t satisfy your standard and it takes nontrivial effects to rewrite them.
2 Likes

I think this case is specially handled in compiler (because lower of for loop immediately places a case split after iterate) so it’s not automatic union splitting.

I mean relying on type instablity, instead of using reflection mechnism in those static languages, is fragile. This poses challenge to compile and cache them.

Yeah. I also think it’s too hard to compose OOP with multiple dispatch, even something like Go’s interface is impossible. Maybe we can have OOP as a separate mechnism, but people may not want to accept this. But since other options are even worse (like developping a sophisticated analyzer), I have to consider this option.

I won’t consider program analysis like I have mentioned before. It’s not because it can’t detect enough bugs. It’s suitable if you have a separate team to apply the tool and file the issues regularly (this is also how academic PL researchers design the tools). But programmers generally have different expectations on this tool. For example, imprecised warning annoys programmers. Also, untyped Python runs much faster than untyped Julia. This puts higher requirement for the checker.

1 Like

I’m sure there is truth behind your statement, but I would recommend using lighter phrasing, because at least personally I have “prototyped and navigated ideas” first with slower code then gone through and optimized (and typically making it more static in the process) many times

3 Likes

But that’s different right? You make it “more” static, not 100% static. But a static subset doesn’t accept 99% static codes, it will reject that. A static compiler behavors quite differently from current Julia’s compiler. Current Julia treats types as optimzations while static compiler is strict. You can do this to optimize your codes, but that doesn’t quite adopt to the staic compilation problem. So you have to get all your work done before you move to this subset.

This is precisely the reason why the gradual improvement model fails badly, and why we need to be careful to create a subset of the language. We have a subset and a fullset language, but what about transition between them? You still need a powerful type checker that can handle both staic and dynamic codes so that programmer can be guided through this incremental process. But if you can systematically design one, why not just add them to your language? If you can’t do that, then it means at some point some programmers may stuck at converting their dynamic programs to the staic one.

2 Likes

No, the only special handling is the lowering of the for loop. The union splitting is visible in @code_warntype and shows up the same if you replaced a looped iterate with any other method.

That’s possible, Python has a multipledispatch package and some OOP languages kind of have the syntax with function overloading. The problem is putting classes in real argument annotations (not the type hints Python has at most); I summed up the attempts I’ve seen in this comment. And those attempts were emulating Python, an OOP language with no formal type stability, and it’ll only get more awkward and brittle when considering that crucial feature. It becomes clearer why Lisp is the way it is.

I don’t remember where I read this, but someone speculated mimicking virtual method tables by fixing the specializations (and perhaps their return type) a runtime dispatch could do and throwing an error otherwise, even if the table was derived empirically from a running program. No syntax concept was provided, it was very speculative.

I think runtime dispatch is faster in Python than Julia, the type checking is less complicated at least, but in my experience this doesn’t bear out in practice because slower code only hurts the program’s performance if it occupies a significant portion of the runtime. Julia has some type inference recovery practices for the critical portions, Python has C-based libraries, it just evens out.

2 Likes

The last talk at JuliaCon Eindhoven Local, Myrthe Schepper’s talk on “ASML’s Julia Journey”, references the earlier talk about the “1.5 language problem”.

There she touched on two culture problem that had been articulated by @MatthijsCox in his Scientific Coder blog.

Some of the issues Julia faces are a manifestation of Conway’s Law. In short, we are likely to end up with two dialects of Julia as we try to integrate both the interests of researchers and software engineers.

We previously discussed some of these items in two prior threads.

What I see in Schepper’s talk is that the two-language problem is a very real one at industrial scale. As a research software engineer and a scientist, this is also a problem that is persistent in academia across multiple domains. Real money is spent porting dynamically typed code to statically typed code. The person writing the dynamically typed code is often not the same person writing the statically typed code.

A counterpoint to this is that it may not be a single person that writes their dynamic code and then rewrites it statically. The two or more people have to be able to communicate and need to have some comprehension of what the others are doing. While it is difficult and costly to refactor codes, some organizations have a strong interest in both producing research code and refactoring that code.

I have run into this situation across a number of languages recenty.

Developers have already accepted that not all Kotlin programs are going to be able to ported to Kotlin Native, but they still see utility in being able to build native binaries. For me, the outstanding question is “Can Julia offer a better experience than the above existing solutions or using two languages?”.

10 Likes

actually, in my specific case, I think it’s basically 100% static. but then again, I don’t have many types much more complicated than Vector{Float64} and some structs to hold input parameters

I know how it lowers and I just want to point out in this case there’s no ambiguity…I just don’t consider this as “automatic” union splitting (manually if-else on a union also doesn’t count), because in this case, the shape of control flow is not changed and basic block is specialized according to the input type. This is always a safe operation as the input CFG is preserved. In contrast, split of f(x) into if x isa XX f(pinode1(x) then f(pinode2(x)) will change the CFG greatly and I consider this as “automatic” union splitting, because programmers don’t explicitly require a split here.

You mean @thautwarm? Last year I had an internship at his company and we discussed possibility of such approach. He hypothesized that we can generate a unique id for each these specialied method tables (and any other metadata) and runtime look becomes cheap. Currently he also do experiments on static compilation. But our view is different. Speculation makes debug too hard and I definitely won’t try to do that.

I think this is more important than we originally thought and it has profound impact on Julia.

The first question is that how slow untyped Julia it is compared to untyped Python. I didn’t conduct detailed experiments. But a simple quick sort program shows around 5x gap (it’s untyped compiled Julia, not even interpreted Julia). I think testing other leetcode problems should yield a similar results.

5x is a huge gap (not even counting compilation latency). If loading python’s library takes 1 seconds, it will take more than 5 seconds in Julia. This makes me believe that most Julia codes should be compiled and typed, otherwise you will spend a lot of running time in the glue codes. Plots.jl is one sample, without recent code refactor to remove invalidation, the latency is horrible. That’s why I think a static superset is more helpful. Even after 1.9, the compilation latency of dynamic codes is hard to improve and the developement experience becomes worse quickly when codebases grow.

Julia also has no “C-based libraries” counterpart. Even if you have recovery practices, they contribute to part of the compilation latency…and this requires library writers to annotate their critical components. In Python programmers just don’t need to do this.

This is a valid point, but it also puts a high requirement on the user group…

(and maybe ReactNative). But currently except Java/GraalVM, none of these techniques really get wide usage. Kotlin Native is still immature, and acts more or less a syntax sugar for swift on ios. You can’t use any builtin library in Kotlin (but you can import ios developer kit). And once Julia was shown to be superior than the broken combination of Cython/Python, but now Julia tries to create its own dialect…

There are some, like LinearAlgebra (OpenBLAS) and FFTW.jl (FFTW).

You just can’t in Python. To go static you must dive into other languages (not necessarily disjoint ones, Cython is a superset). Not certain about this pure Julia analogy, but the language barrier there is like type-stable function barriers.

If you only use simple types, then I think a fully static type checker is still beneficial for your workflow, because it catches many trivial spelling errors in advance.

My SimpleTypeChecker.jl adopts a similar approach to typescript. It always typechecks every function in the file, but it skips any codes or functions marked as @ignore. So even when writing static Julia codes, I can still use dynamic codes to debug and prototype.

When type and functionality get more complicated, code refactoring will be more difficult. Sometimes you may need to outline the function barrier or add if-else manually to ensure type stability.

…what I mean is that unlike Python’s C libraries, Julia’s incremental module can’t be distributed separately and there’s no stable API of it. If it can, then Julia’s latency is a less serious problem.

Installation time of recent Julia is absurd. I can undertstand why but it gets me crazy. It takes me several minutes to install and update one package, because it spends a lot of time recompiling package. I have to interrupt this process and revert to the old havior of Julia 1.0 (start to precompile only when using the package). Most of the these compilation time is wasted because I use only a fraction of the functionality.

If I can directly install prebuilt binary like Python, then the development experience will be smoother.

Not sure what you mean here, that sounds like a package but obviously isn’t.

That’d be nice, but it has its own hurdles. The really popular libraries will have builds for the major OSs and architectures, but some developers will not have the resources or knowledge for that. So, building from source is a common fallback option. I don’t know if this is up to date anymore, but I remember the SciPy docs mentioning that one should expect >3 hours to build SciPy from source on Windows. Julia’s divided composable packages really cuts down on that time, but so far we must personally precompile, and it likely won’t even cover all the compilation we’ll ever do. Wonder if there can be a standard cloud service for Julia developers to do this.