Is my understanding of Julia correct?

“Optimal” is something not precisely defined. To get really optimal performance you need to optimize you code carefully, as in any other language. Meaning, to get the performance of a code that was optimized by an expert programmer in C++, you probably need to understand well how Julia works and the possible optimizations. In many cases achieving the best performance is easier in Julia than in other languages, in other cases it is harder. I would say that it is generally easier.

Yet, to get good performance is relatively easy, and you just need to follow a few rules of thumb (avoid non-constant globals, avoid slices that allocate, abstract containers…). There are things that the very early Julia programmer may fail to do, but these tips should be more or less incorporated after a few weeks of Julia programming experience.

Yes, to have fast code you need to be careful with type stability. However, IMHO, this is so advertised that is a typical case of early optimization. By programming in Julia we become sort of obsessed with type stability and in many cases it is really irrelevant. The same goes for the allocations. But yes, when it comes to performance optimization these are important things. What happens in Python, for instance, is that there function barriers between the slow code and the fast code (written in other language). Perhaps this allows the programmer to be “relaxed” in the Python part, until the moment of transferring the data to the lower-level functions that need to be fast using specific data structures. With practice one can do all that in Julia, that is, be relaxed where performance does not matter, and take care of types and structures when passing the data to the functions that need to be performant.


In my experience, type stabilities are more of a constant struggle and continues to be the most important thing in making Julia code not horribly slow.


Well, you are in the forefront of developing packages with the ultimate performance :slight_smile: (thank you for that).

But for something like reading a bunch of data, sorting, ordering, organizing, etc, one may not really care about performance. That is not to say that it is a waste of time to be aware of types on doing all that, it can be useful for debugging and code clarity.

(maybe my previous phrase allowed some misinterpretation: I didn’t mean that type stability may be irrelevant for performance, I meant that performance may be irrelevant, and thus type stability should not be something to be obsessively pursued).


It can be hit or miss. Note that type stability can also make a huge impact on compile times, for example on Julia 1.7, type unstable, bar!(::Nothing) takes about 9s to compile:

julia> using RecursiveFactorization

julia> foo!(A::Matrix) =!(A)
foo! (generic function with 1 method)

julia> foo!(A) = A
foo! (generic function with 2 methods)

julia> bar!(x) = foo!(Ref{Any}(x)[])
bar! (generic function with 1 method)

julia> t = time_ns();

julia> @time bar!(nothing)
  0.000068 seconds (416 allocations: 29.359 KiB, 86.92% compilation time)

julia> 1e-9*(time_ns() - t)

Make bar! type stable, and it’s more than 400x faster to compile:

julia> using RecursiveFactorization

julia> foo!(A::Matrix) =!(A)
foo! (generic function with 1 method)

julia> foo!(A) = A
foo! (generic function with 2 methods)

julia> bar!(x) = foo!(x)
bar! (generic function with 1 method)

julia> t = time_ns();

julia> @time bar!(nothing)
  0.000000 seconds

julia> 1e-9*(time_ns() - t)

Note that @time unfortunately forces a lot of compilation that it doesn’t time, so you need to copy/paste the surrounding block to actually time compilation.

There’s all sorts of opportunities for problems to creep in. E.g., recently, I used Returns without realizing that Returns <: Function, which resulted in a 3x compile time regression and a 2x runtime regression. Branches returning different types is of course another ubiquitious problem.
If you depend on a lot of packages and work on a large codebase, it is unfortunately difficult to avoid.

And perhaps the runtime of the code isn’t important, but type instabilities sometimes have a substantial impact on compile time performance, making latency unacceptable slow.

While I’m sometimes the source of the problem – I’ve written closures and unwittingly passed functions as arguments – given how much time I also spend looking at other people’s code for type instabilities, I’d prefer if they did pursue it obsessively. =)

Preferably, as a matter of principle rather than benchmark driven.

The same changes that dropped OrdinaryDiffEq’s compile time from 22 to 3 seconds introduced a seemingly innocuous type instability (inside a function barrier) resulting in a 50% increase in compile time of our code using OrdinaryDiffEq.
That was unfortunately far from the only example…
Something that works well or improves the situation in a small example can and does blow up and go the other way in a larger example.

But I do agree when it comes to scripts vs libraries.
Libraries are hard to view in isolation, but scripts and end-user apps aren’t.


This one could use a bit of a caveat, in that typically mutable structs are heap allocated and typically immutable structs are stack allocated, but the compiler is free to optimize things either way or even remove them entirely (if it can do so unobservably). See, e.g., A nice explanation of memory stack vs. heap - #2 by sylvaticus and the subsequent posts.


These are some good examples of a fundamental challenge: that in Julia the burden of hitting the right semantics for the optimizer has been shifted largely from the type system (incl checker) to the user, vs static languages. As you point out, it’s even harder when considering compositionality, which could result in playing optimizer whack-a-mole. Tkf calls this aptly ‘programming in a optimizer defined sub-dialect’. Things grow more acute as the set of optimizations and transforms grows beyond what was initially co-designed with the type system / language, like AD, GPU codegen, escape analysis/memory elision and various combinations of the aforementioned. (see here for more of my thoughts on that for ML specifically).

I wonder if there is a general solution to this. Can there be domain specific opt in static typing or will improvement in tooling like JET help much? (Have you tried JET?)

The extensible static typing route could be particularly interesting for julia as we aren’t locked into one static system and all the tradeoffs that entails. Though maybe this is very wrong? I don’t know.

But perhaps the designs around user defined compiler passes should be paired with some system for composable user defined semantic invariants.

I know @cscherrer has similar concerns has he relies heavily on type based compile time programming in soss.jl with GitHub - JuliaStaging/GeneralizedGenerated.jl: A generalized version of Julia generated functions @generated to allow closures in generated functions and avoid the use of runtime eval or invokelatest. , but that apparently may be an antipattern. If so, what is the replacement?


Vectors are by default stored on the heap. If the vector is reasonably small (< 100 elements) then you can use Static Vectors, which are stored on the stack. The naming is odd - the heap always feels more static to me - Static Vectors are so named because the vector size is static, you can’t push onto a Static Vector.

1 Like

This seems to be going ok, but the large breadth of topics risks breaching this policy POLICY: feedback discussion splitting. If things start heating up, we may lock the thread and request splitting it into more focused topics.


Julia now has escape analysis, although the optimizer isn’t use it yet.
Once it starts using it, we’ll be getting mutable structs on the heap more and more often.
Julia’s base Array will also probably start appearing on the stack a lot eventually Move small arrays to the stack by pchintalapudi · Pull Request #43573 · JuliaLang/julia · GitHub
but this, too, is dependent on the array not escaping (again meaning we need the escape analysis for this to start being really common).
[For any anyone curious why escape analysis is essential: stack memory is only valid within a scope. Meaning any reference to the stack you have later will probably be corrupt, hence you cannot allow stack references – and thus any object allocated on the stack – to “escape”. Regular structs work around this via copying the memory when needed.]

@Akatz I strongly agree with your comments here, and would very much like to see some burden lifted off the programmer. I should try playing with JET more.
I’d love to see a more static language.
I don’t think that poses any limitations on the REPL, e.g. you have no problems changing the types of variables in Rust, as redefining with let x = ... simply shadows the old binding.


Is it worth splitting this off into a separate discussion? I’d love to hear thoughts from some of the core team if possible and would be helpful to gather feedback from users in one place. I know @jpsamaroo and @tkf are into the JET/ dynamic semantics side of things while you and @Elrod are interested in exploring more from the type system. @ckfinite Is also working on static typing. Sounds ripe for a broader discussion to work out the pros and cons of various design approaches.

Most of my thinking here was strongly driven by @Keno 's various type lattice explorations.


Sounds ripe for a broader discussion to work out the pros and cons of various design approaches.

The interaction between static typing alone and type stability isn’t as straightforward as it seems; due to abstract types (be they unionalls, unions, or just normal abstract types) it’s wholly possible for a statically-well-typed program to be unstable. An additional analysis is needed beyond just type safety to ensure stability, which has considerable additional complexity of its own (and is, like type safety itself, at best going to be incomplete).

My colleague @ulysses is working on a static analysis for stability now, based out of our earlier work (paper here) on type stability itself, and might be able to chime in.

One topic that I’m not sure has been discussed about type stability is that some relatively simple optimizations (polymorphic inline caching, in particular, and dispatch stubs more generally) may be able to dramatically improve performance of type unstable code and stave off the need for full inference and compilation. However, it breaks the beautiful simplicity of Julia’s execution model.


IMHO the discussion on type stability became so technical that would deserve splitting even without the heating :slight_smile:

Yes, of course, we are talking from a different perspective. The OP is concerned with ML applications. Of course everything here may apply, particularly for scientific ML, but for most applications a little bit of care and following the performance tips is good enough. Particularly when most of the computational cost is within the libraries, you just need to write code that doesn’t mess up with the internal representation of the data in the library.

(I don’t think anyone disagrees here, it is just that I have the feeling that the thread became a little bit overwhelming for the general reader).

1 Like

Thank everybody for the replies and discussions, although I cannot understand some of them now, but thank you for pointing out these advanced topics to look at.

Thanks for the reply. I am not trying to say Julia has a problem. However, I do think there is no perfect language for everything. Understanding some of the design choices helps me to see what a language is good at and what it is less ideal for.


A quick question if I may ask: As someone who tries to get the best performance out of Julia, you seem to prefer some static type language features. I am wondering when you aiming for performance, currently, is Julia harder to work with compared to a static type language, such as C++ or Rust? And maybe why?

I think what people call “mutate-or-widen” approach may be somewhat relevant: Tail-call optimization and function-barrier -based accumulation in loops. A similar idea is applied to [ANN] Catwalk.jl - With dynamic dispatch to the moon! (an adaptive optimizer, aka JIT compiler)

I call the generalized version of it a “tail-call function-barrier” approach and use it extensively in JuliaFolds to help type-stability of otherwise unstable for loops (and iterations in general). But, more importantly, I think this approach is particularly attractive because even unstable code dynamically type-stabilizes. I think this approach is an example that demonstrates the dynamic-and-efficient aspect of Julia.

That said, using this principle in practice is very manual and cumbersome at the moment. JuliaFolds hide the complexity in many cases. But I’ve been wanting to play with the compiler to make it more automatic.

Even Haskell and nowadays even C++ have REPL. So, I agree that REPL would be a blocker. It’s a bit tangent, but, rather, I think the important aspect is that we can freely write “broken” code easily (which is the greatest strength of dynamic languages; extremely rapid feedback). Interestingly, Haskell has -fdefer-type-errors for “disabling” type check and also I’ve heard Roc language tries to incorporate this idea more seriously. Perhaps dynamic and static languages meet in the middle in the future. On the dynamic language side, there are examples like Python which has mypyc project for compiling type-annotated code. But I think Julia is a very rare case where the optimizability of dynamic code has been the design goal from the getgo. I guess it’s reasonable to hope there is a “static sub-language” in Julia waiting to be discovered.

Just to be clear, I’m not at all against defining sub-language that is statically compiled and possibly executable without runtime. My main comment has been clarifying the current status of Julia as a language, especially on what is (not) guaranteed. (But I personally like playing with “dynamic but optimizable semantics” and so I may be exaggerating this aspect.)


That’s a neat pattern! Yes, that’s more or less hand-rolled monomorphic inline caching, where we assume that the call site will only ever need to deal with a single type. Wikipedia has a nice example of it. It’s fairly readily possible to extend it to manual polymorphic inline caching with generated functions, too, I think.

I didn’t know the word “megamorphic.” Sounds useful :slight_smile:

Yes, that’s what @tisztamo did in the aforementioned Catwalk.jl (plus more fancy things like call frequency-based optimizations), IIRC.

Not me. I came for expressiveness, lack of OOP, and JuMP. Julia enables me express my programming ideas better than any of the other languages I have tried over 40 years of programming (C, C++, Python, Limbo, Javascript)

sort of, in that rebinding variables as in a = 1; a = "1" is valid code
but not so in that this kind of code will fail a = 1 + "1" unlike say PHP where it evaluates to 2 or Javascript where it evaluates to “11”.


Julia is dynamically typed. For more info, see the answer at