"A Tragedy of Julia’s Type System"

Now that the main discussion has died down, I just wanted to address this tangent. The problem happens when you capture a variable that is assigned at multiple points in the scope from which it’s captured. Defining local_x = x works because it introduces a new variable that only has a single point of assignment and is therefore safe to capture.

There are innumerable discourse threads about this, as well as this manual section and the infamous issue #15276, so we don’t need to import any further debate to this thread. I just wanted to clarify what the key point is—it can be surprisingly easy to miss in all the ink spilled on this issue.

2 Likes

The commenter who made the quicksort/runtime-dispatch benchmark and questionably concluded “untyped” Julia is 5x slower than Python is indeed the author of the blog; I’d hazard a guess that mutated into a 5x comparison of runtime dispatches alone. To save anyone else the trouble of reading that whole thread, it was questioned over the lack of control over the number of dispatches in either the Python or Julia versions, the inability to separate the benchmarking of the sorting from the dispatches of interest, and the translation becoming unequivalent in order to force poor type inference. There is a good point there about caching runtime dispatched code possibly distorting a benchmark, I wonder if I could improve my earlier + benchmark; maybe Python’s repeated bound method instantiation is slow because it’s less cache-friendly.

The more I read Medium the more I realize it really isn’t a place conducive to discourse. Good for examples and education. Good for theory crafting. Not very useful for planning/critiquing.

I agree that there are oddities with the Julia type system that can be unintuitive to solve OR written to be inflexible (because ultimately, you can type coerce and static type everything if it bothers you that much).

But from my perspective, as someone who works more in science than in programming with some moderate programming background, I cannot fully express how game changing Julia is for scientists.

An underrated piece of Julia to a lot of programming people is that it’s “Julia all the way down.” To the programmer it is insignificant, but to the scientist it is MASSIVE. I know so many scientists that refuse to contribute to packages because they don’t understand cpp and don’t have the time to learn. Julia gives them a way to contribute without necessarily needing to know cpp.

I also just want to share a reminder that, while NN’s and ML have taken over computer science, there’s still a LOT of people doing other stuff. I’m fine if a NN person finds Julia doesn’t work for their needs. I find I prefer plotting and geospatial in R. That isn’t a reason to wholesale write off Julia, or any other language for that matter. They each do something they seek to achieve. Julia went to fix the two language problem and has been fairly successful at that. Is it perfect? Nothing is. But without Julia, I couldn’t cut an analysis runtime by 95% while still being able to share it and get collaborators up to speed within a day. That is really valuable and should not be understated.

This is partly why I’m getting more into contributing to Julia. It has lots of promise for science and it won’t get better without engagement. Python wasn’t built in a day. That’s okay.

44 Likes

I’m fine if Python/C++ achieved dominance in the NN space purely because of the networking effects of the large ecosystem. But as a regular Julia user, I worry if there are actually valid technical reasons why Julia is not as competitive for supercomputing-scale workloads like GPT model training. I don’t believe type stability is such a reason. (In any case, C++ variants are a bigger tragedy than Julia’s union types can ever be.) However, I wonder if C++'s RAII and Python’s reference counting are essential for keeping memory use under control when processing large datasets. If you want to bypass the GC in Julia, there are packages like ManualMemory.jl and StaticTools.jl which work fine for arrays of primitive types but don’t work for complex data structures, say, dictionaries of mutable structs. My personal experience (not related to NN) is that Julia code can easily approach C++ level of performance in terms of speed, but not in memory efficiency, unless you can completely eliminate heap allocation from the main loop (which can be hard).

10 Likes

^this.

The basic rule if you create a closure that captures x is:

  1. The (smart) compiler never gets to optimize capturing. That is done in lowering, by a very stupid hackish piece of code that operates on the AST and doesn’t even get to see the real control flow graph
  2. use single-static-assignment (i.e. x syntactically appears on the left-hand-side of an assignment exactly once)
  3. Don’t use @goto, that tends to confuse lowering!
  4. Make sure that this single static assignment dominates the definition of your closure, and that this definition dominates the call.

This is super annoying, because for most code you’re used to a pretty smart compiler/optimizer, but the rules for getting good closures are completely different (because the lowering pass doing closure boxing is so very stupid).

4 Likes

Also, I think everyone agrees that’s a quirk (or bug) but it doesn’t have to be that way. And there are chances that this could be solved in the future.

2 Likes

Stupid, but not trivially. Fundamentally a call-wise compiler can only go so far for variables that exist across multiple calls. It has to get it “right” on the first enclosing call or else other calls will error or have undefined behavior; we often don’t have the input argument types needed to infer those calls by the time the first call happens. There’s definitely room for improvement: 1) if we only reassign instances computed from global or other captured variables, that can be inferred on the first call, 2) if the closure can be proven to not escape (erring on the side of cautiously assuming it does), its calls can be inferred as if inlined, 3) it’d be nice to write type-stable mutually recursive closures, even with no reassignments. But a closure like f = let x = 1; _f(y) = (x=y) end cannot be type stable unless an AOT-compiler sees all the calls we’ll ever do and detects all inputs are y::Int.

Julia has many analysis tools which can analyze your code. You can descend into any function to resolve the issue.

So, at least @Benny is of the opinion that it has to be that way, i.e. that inference of non-SSA captures will always and unavoidable suck.

But additionally, it sucks extra hard, because the lowering is dumb. E.g.

function f()
x = 0
if false x = 1 end
g()=x
g()
end

gleefully infers as Any. This is because lowering is dumb and only sees the AST, and neither CFG nor const-prop nor escape analysis nor anything!

I tend to agree with @Benny, the problem is hard, and it will always suck. But it doesn’t need to suck that badly.

Regardless, it has been that way for 8+ years. I think it is disingeneous to pretend that boxing of captures will be solved “with the attack Keno”.

Even if we somehow got much better optimization, by deferring boxing decisions to later optimization stages that have CFG, type inference and escape analysis available, the semantics of captures mean that we will box a lot.

I see a much brighter future for a command-line switch that emits warnings for all boxed captures – i.e. for everything that is not capture-by-value.

And then a slow community shift to always enable that switch, and thus transition to a julia dialect / subset that has capture-by-value.

(due to macros and generated functions, boxing cannot be statically decided / linted, you really must hook the compiler in order to be able to warn users that they wrote really bad code)

“Can only go so far” did not imply “can’t go anywhere”. In the list following the words “room for improvement” in my comment you’re quoting, points (1) or (2) would accomplish type stability for your if false x = 1 end example, no dead code elimination needed (such obvious cases are so limited and brittle, it should be removed manually instead of automatically).

1 Like

I played with the code example you gave. It looks like the variable x is always boxed when the if statement is used with closure.

I tried to give type declaration like x::Int but it does not help. I have to either replace the if statement with the function ifelse(), or replace the closure with

g = let x=x
() -> x
end

Then the boxed variable and Any is gone.

Surprisingly, g(; x=x) = x does not work. I used to thought this is equivalent to using the let block. What’s happening here?

Doesn’t solve it but it’s a lot easier to automatically flag such issues with DispatchDoctor.jl btw:

julia> using DispatchDoctor: @stable

julia> function f()
           x = 0
           if false x = 1 end
           @stable g()=x
           g()
       end
f (generic function with 1 method)

julia> f()
ERROR: DispatchDoctor.TypeInstabilityError: Instability detected in `g` defined at REPL[2]:4. Inferred to be `Any`, which is not a concrete type.
9 Likes

Yes, like foobar_lv2 said, boxing happens when the expression gets lowered. The closure gets an underlying struct, and the captured variable’s field gets annotated ::Core.Box because of the reassignment. The compiler can’t undo that, the lowerer has to get smarter and cooperate. Tip: use dump or fieldnames+fieldtypes on the underlying struct to see this.

The 2nd x is still the reassigned one and it’s still captured, the same conditions for boxing.

It does help in another way, it adds a convert and typeassert on every access to the box, so despite the loading overhead, subsequent code can assume the annotated type for the captured variable. I think it should be possible to use annotations to type the box (like going from Ref{Any} to Ref{Int}) and reduce overhead, but I don’t know if I’m missing some practical hurdles.

Someone should probably split this into yet another thread about the performance of captured variables, the blog didn’t even address this.

For what it’s worth, there’s a currently open PR specifically trying to fix cases like this right now under active development: perform inference using optimizer-derived type information by aviatesk · Pull Request #56687 · JuliaLang/julia · GitHub

The idea is that after things like type inference and dead code elimination, the compiler can go back to a very early point in the pipeline and use knowledge it learned later to say

ah, xs type never changes and can’t be lexically modified anywhere in the program, so it’s okay to just let g return 0 instead of forcing it to go through a Box

I’m pretty optimistic this can work. Specifically check out the tests the PR adds showing it can correctly infer return types in very similar functions: julia/Compiler/test/inference.jl at b775f2008d944c376f4f3f9f48ff2a096eb66a74 · JuliaLang/julia · GitHub

These tests are already passing.

13 Likes