Is my understanding of Julia correct?

Memory Management

Garbage Collector

I noticed this issue when Chris Lattner pointed during a [debate]. In short, Julia uses the mark-and-sweep Garbage Collector. It may cause memory not being released in time. This is especially problematic for GPU since its memory is much smaller than CPU memory. The good news is there is some [workaround], but it would be better if Julia has some deterministic memory management. There is an [ongoing discussion about this].

Heap or Stack

As a high level language, Julia also does not allow the programmer to explicitly control whether the memory should be allocated on stack or heap. Instead, it is up to the complier to optimize for you.

Speed

I guess most people come to Julia for speed. However, to get the optimal speed, it is not that straightforward.

Avoid allocation

One problem is to avoid heap allocation. As I mentioned in the previous section. Heap or stack allocation cannot be controlled explicitly. Then it is kind like play a guessing game with the compiler to avoid allocation. The good news is, there are some [rule of thumbs].

Type Stable

Another problem is type inference. Julia is a dynamic typing language; however, to get the best speed, you are encouraged to have type stability. My understanding of this is as following: You can you Julia just a dynamic type language (such as Python), and have all the easy-of-mind and fast-prototyping. However, when you need it to be as fast as possible, you can use Julia as if it is a static type language, take care of the type stability (kind of like the generic type in a static type language like C++). However, the only con I see is that unlike a static type language that does type check at compile time and yell at you when it fails at type check. You need to call a few macros yourself to check whether it is type stable.

Type Annotation

Note that type annotation does not help if the compiler can figure out a stable type. In that case, annotating it to a narrower type only makes your function less reusable.

Multiple Dispatch

For a dynamic type language, this is very rare to see. For example, Python does not have this feature at all. For a static type language (C++ for example), this is very common to see if the type is known at compile time. Julia and C++ can both do heavy optimization. When it is not known at compile time. C++ can do dynamic single dispatch with virtual table. C++ can also do runtime double dispatch with a design pattern. And I donā€™t hear people mention runtime mutiple (>2) dispatch in C++, but I think it is possible by implementing some kind of virtual table yourself. For this runtime/dynamic multiple dispatch case, I think Julia does a lookup at runtimes. So it is kind of similar to the vtable idea, but the implementation of the lookup is different.

Ahead of Time (AOT) Compilation vs. Just In Time (JIT) Compilation vs. Interpreter

Usually, static type languages are AOT compiled while dynamic language runs on an interpreter, and sometimes with a JIT compiler. Because you need to know the type to deeply optimize the code, static type languages can directly be optimized AOT, while dynamic typing languages can only to optimized at runtime when the type is known with a JIT. Julia is somewhere in between. It is [just ahead of time (JAOT)] compiled. Julia does type inference as early as it can. But for a dynamic type language, inevitably, the type cannot be inferred before runtime. The pro is that you have a deeply optimized dynamic typing language. The con is this JAOT takes a long time, and this is the well known time to first plot issue. This issue is not a big deal if you use Julia for deep learning. However, dynamic typing is something built in the design of the language, while you enjoy the ease of use, you also have to bear with the awkwardness it brings: 1) long JAOT compilation time; 2) hard to build into libraries and executables. Luckily, both of these two are being actively worked on ([PackageCompiler.jl], [StaticCompiler.jl]) but very difficult IMO.

P.S. I can only pose two links because I am a new user. My full blog is here. It is something I wrote a year ago, and I realize some of it is wrong so I updated it and want to verify my current understanding is right.

6 Likes

No, no and no.

Julia uses a generational garbage collector (which is better, an optimization), thatā€™s the more correct term. It doesnā€™t use mark-and-sweep, or at least thatā€™s a simplification. Yes, I believe thatā€™s the most correct term (plus generational), as apposed to mark-compact, which at least isnā€™t used (because of allowing calling C).

What is ā€œreleased in timeā€? Whatā€™s the hurry? :slight_smile: Yes, there are pros and cons (delayed deallocation isnā€™t a huge problem, or memory use, nor rather delaying closing files which which a GC doesnā€™t address, but e.g. Rust and destructors in C++ do), but lots of dead objects are released very quickly because of generational, which is its point, but not all. Itā€™s not so problematic for a GPU, because Julia already targets them (CUDA on of the official tiers, yes, strictly through a package/s), and then no GC is used, yes, with GPUs with GC would be problematic.

A lot of Julia code avoids the GC, and thus the issues, including some hard-realtime code, and for GPU use. But yes, you need to do that, limit yourself then.

Thatā€™s a good thing, since users then have a mental load (and may screw things up). Julia does escape-analysis, and can and does stack allocate.

I need to run, may answer more completely later, couldnā€™t read further. At least welcome to the Julia community. If you think Julia has a problem, youā€™re likely wrong, or at least thereā€™s a non-default workaround, e.g. static executables possible.

2 Likes

ā€œOptimalā€ is something not precisely defined. To get really optimal performance you need to optimize you code carefully, as in any other language. Meaning, to get the performance of a code that was optimized by an expert programmer in C++, you probably need to understand well how Julia works and the possible optimizations. In many cases achieving the best performance is easier in Julia than in other languages, in other cases it is harder. I would say that it is generally easier.

Yet, to get good performance is relatively easy, and you just need to follow a few rules of thumb (avoid non-constant globals, avoid slices that allocate, abstract containersā€¦). There are things that the very early Julia programmer may fail to do, but these tips should be more or less incorporated after a few weeks of Julia programming experience.

Yes, to have fast code you need to be careful with type stability. However, IMHO, this is so advertised that is a typical case of early optimization. By programming in Julia we become sort of obsessed with type stability and in many cases it is really irrelevant. The same goes for the allocations. But yes, when it comes to performance optimization these are important things. What happens in Python, for instance, is that there function barriers between the slow code and the fast code (written in other language). Perhaps this allows the programmer to be ā€œrelaxedā€ in the Python part, until the moment of transferring the data to the lower-level functions that need to be fast using specific data structures. With practice one can do all that in Julia, that is, be relaxed where performance does not matter, and take care of types and structures when passing the data to the functions that need to be performant.

9 Likes

In my experience, type stabilities are more of a constant struggle and continues to be the most important thing in making Julia code not horribly slow.

13 Likes

Well, you are in the forefront of developing packages with the ultimate performance :slight_smile: (thank you for that).

But for something like reading a bunch of data, sorting, ordering, organizing, etc, one may not really care about performance. That is not to say that it is a waste of time to be aware of types on doing all that, it can be useful for debugging and code clarity.

(maybe my previous phrase allowed some misinterpretation: I didnā€™t mean that type stability may be irrelevant for performance, I meant that performance may be irrelevant, and thus type stability should not be something to be obsessively pursued).

3 Likes

It can be hit or miss. Note that type stability can also make a huge impact on compile times, for example on Julia 1.7, type unstable, bar!(::Nothing) takes about 9s to compile:

julia> using RecursiveFactorization

julia> foo!(A::Matrix) = RecursiveFactorization.lu!(A)
foo! (generic function with 1 method)

julia> foo!(A) = A
foo! (generic function with 2 methods)

julia> bar!(x) = foo!(Ref{Any}(x)[])
bar! (generic function with 1 method)

julia> t = time_ns();

julia> @time bar!(nothing)
  0.000068 seconds (416 allocations: 29.359 KiB, 86.92% compilation time)

julia> 1e-9*(time_ns() - t)
9.088390749

Make bar! type stable, and itā€™s more than 400x faster to compile:

julia> using RecursiveFactorization

julia> foo!(A::Matrix) = RecursiveFactorization.lu!(A)
foo! (generic function with 1 method)

julia> foo!(A) = A
foo! (generic function with 2 methods)

julia> bar!(x) = foo!(x)
bar! (generic function with 1 method)

julia> t = time_ns();

julia> @time bar!(nothing)
  0.000000 seconds

julia> 1e-9*(time_ns() - t)
0.02120855

Note that @time unfortunately forces a lot of compilation that it doesnā€™t time, so you need to copy/paste the surrounding block to actually time compilation.

Thereā€™s all sorts of opportunities for problems to creep in. E.g., recently, I used Returns without realizing that Returns <: Function, which resulted in a 3x compile time regression and a 2x runtime regression. Branches returning different types is of course another ubiquitious problem.
If you depend on a lot of packages and work on a large codebase, it is unfortunately difficult to avoid.

And perhaps the runtime of the code isnā€™t important, but type instabilities sometimes have a substantial impact on compile time performance, making latency unacceptable slow.

While Iā€™m sometimes the source of the problem ā€“ Iā€™ve written closures and unwittingly passed functions as arguments ā€“ given how much time I also spend looking at other peopleā€™s code for type instabilities, Iā€™d prefer if they did pursue it obsessively. =)

Preferably, as a matter of principle rather than benchmark driven.

Why?
The same changes that dropped OrdinaryDiffEqā€™s compile time from 22 to 3 seconds introduced a seemingly innocuous type instability (inside a function barrier) resulting in a 50% increase in compile time of our code using OrdinaryDiffEq.
That was unfortunately far from the only exampleā€¦
Something that works well or improves the situation in a small example can and does blow up and go the other way in a larger example.

EDIT:
But I do agree when it comes to scripts vs libraries.
Libraries are hard to view in isolation, but scripts and end-user apps arenā€™t.

16 Likes

This one could use a bit of a caveat, in that typically mutable structs are heap allocated and typically immutable structs are stack allocated, but the compiler is free to optimize things either way or even remove them entirely (if it can do so unobservably). See, e.g., A nice explanation of memory stack vs. heap - #2 by sylvaticus and the subsequent posts.

8 Likes

These are some good examples of a fundamental challenge: that in Julia the burden of hitting the right semantics for the optimizer has been shifted largely from the type system (incl checker) to the user, vs static languages. As you point out, itā€™s even harder when considering compositionality, which could result in playing optimizer whack-a-mole. Tkf calls this aptly ā€˜programming in a optimizer defined sub-dialectā€™. Things grow more acute as the set of optimizations and transforms grows beyond what was initially co-designed with the type system / language, like AD, GPU codegen, escape analysis/memory elision and various combinations of the aforementioned. (see here for more of my thoughts on that for ML specifically).

I wonder if there is a general solution to this. Can there be domain specific opt in static typing or will improvement in tooling like JET help much? (Have you tried JET?)

The extensible static typing route could be particularly interesting for julia as we arenā€™t locked into one static system and all the tradeoffs that entails. Though maybe this is very wrong? I donā€™t know.

But perhaps the designs around user defined compiler passes should be paired with some system for composable user defined semantic invariants.

I know @cscherrer has similar concerns has he relies heavily on type based compile time programming in soss.jl with GitHub - JuliaStaging/GeneralizedGenerated.jl: A generalized version of Julia generated functions @generated to allow closures in generated functions and avoid the use of runtime eval or invokelatest. , but that apparently may be an antipattern. If so, what is the replacement?

5 Likes

Vectors are by default stored on the heap. If the vector is reasonably small (< 100 elements) then you can use Static Vectors, which are stored on the stack. The naming is odd - the heap always feels more static to me - Static Vectors are so named because the vector size is static, you canā€™t push onto a Static Vector.

https://docs.julialang.org/en/v1/manual/performance-tips/

1 Like

This seems to be going ok, but the large breadth of topics risks breaching this policy POLICY: feedback discussion splitting. If things start heating up, we may lock the thread and request splitting it into more focused topics.

4 Likes

Julia now has escape analysis, although the optimizer isnā€™t use it yet.
Once it starts using it, weā€™ll be getting mutable structs on the heap more and more often.
Juliaā€™s base Array will also probably start appearing on the stack a lot eventually Move small arrays to the stack by pchintalapudi Ā· Pull Request #43573 Ā· JuliaLang/julia Ā· GitHub
but this, too, is dependent on the array not escaping (again meaning we need the escape analysis for this to start being really common).
[For any anyone curious why escape analysis is essential: stack memory is only valid within a scope. Meaning any reference to the stack you have later will probably be corrupt, hence you cannot allow stack references ā€“ and thus any object allocated on the stack ā€“ to ā€œescapeā€. Regular structs work around this via copying the memory when needed.]

@Akatz I strongly agree with your comments here, and would very much like to see some burden lifted off the programmer. I should try playing with JET more.
Iā€™d love to see a more static language.
I donā€™t think that poses any limitations on the REPL, e.g. you have no problems changing the types of variables in Rust, as redefining with let x = ... simply shadows the old binding.

6 Likes

Is it worth splitting this off into a separate discussion? Iā€™d love to hear thoughts from some of the core team if possible and would be helpful to gather feedback from users in one place. I know @jpsamaroo and @tkf are into the JET/ dynamic semantics side of things while you and @Elrod are interested in exploring more from the type system. @ckfinite Is also working on static typing. Sounds ripe for a broader discussion to work out the pros and cons of various design approaches.

Most of my thinking here was strongly driven by @Keno 's various type lattice explorations.

2 Likes

Sounds ripe for a broader discussion to work out the pros and cons of various design approaches.

The interaction between static typing alone and type stability isnā€™t as straightforward as it seems; due to abstract types (be they unionalls, unions, or just normal abstract types) itā€™s wholly possible for a statically-well-typed program to be unstable. An additional analysis is needed beyond just type safety to ensure stability, which has considerable additional complexity of its own (and is, like type safety itself, at best going to be incomplete).

My colleague @ulysses is working on a static analysis for stability now, based out of our earlier work (paper here) on type stability itself, and might be able to chime in.

One topic that Iā€™m not sure has been discussed about type stability is that some relatively simple optimizations (polymorphic inline caching, in particular, and dispatch stubs more generally) may be able to dramatically improve performance of type unstable code and stave off the need for full inference and compilation. However, it breaks the beautiful simplicity of Juliaā€™s execution model.

5 Likes

IMHO the discussion on type stability became so technical that would deserve splitting even without the heating :slight_smile:

Yes, of course, we are talking from a different perspective. The OP is concerned with ML applications. Of course everything here may apply, particularly for scientific ML, but for most applications a little bit of care and following the performance tips is good enough. Particularly when most of the computational cost is within the libraries, you just need to write code that doesnā€™t mess up with the internal representation of the data in the library.

(I donā€™t think anyone disagrees here, it is just that I have the feeling that the thread became a little bit overwhelming for the general reader).

1 Like

Thank everybody for the replies and discussions, although I cannot understand some of them now, but thank you for pointing out these advanced topics to look at.

Thanks for the reply. I am not trying to say Julia has a problem. However, I do think there is no perfect language for everything. Understanding some of the design choices helps me to see what a language is good at and what it is less ideal for.

2 Likes

A quick question if I may ask: As someone who tries to get the best performance out of Julia, you seem to prefer some static type language features. I am wondering when you aiming for performance, currently, is Julia harder to work with compared to a static type language, such as C++ or Rust? And maybe why?

I think what people call ā€œmutate-or-widenā€ approach may be somewhat relevant: Tail-call optimization and function-barrier -based accumulation in loops. A similar idea is applied to [ANN] Catwalk.jl - With dynamic dispatch to the moon! (an adaptive optimizer, aka JIT compiler)

I call the generalized version of it a ā€œtail-call function-barrierā€ approach and use it extensively in JuliaFolds to help type-stability of otherwise unstable for loops (and iterations in general). But, more importantly, I think this approach is particularly attractive because even unstable code dynamically type-stabilizes. I think this approach is an example that demonstrates the dynamic-and-efficient aspect of Julia.

That said, using this principle in practice is very manual and cumbersome at the moment. JuliaFolds hide the complexity in many cases. But Iā€™ve been wanting to play with the compiler to make it more automatic.

Even Haskell and nowadays even C++ have REPL. So, I agree that REPL would be a blocker. Itā€™s a bit tangent, but, rather, I think the important aspect is that we can freely write ā€œbrokenā€ code easily (which is the greatest strength of dynamic languages; extremely rapid feedback). Interestingly, Haskell has -fdefer-type-errors for ā€œdisablingā€ type check and also Iā€™ve heard Roc language tries to incorporate this idea more seriously. Perhaps dynamic and static languages meet in the middle in the future. On the dynamic language side, there are examples like Python which has mypyc project for compiling type-annotated code. But I think Julia is a very rare case where the optimizability of dynamic code has been the design goal from the getgo. I guess itā€™s reasonable to hope there is a ā€œstatic sub-languageā€ in Julia waiting to be discovered.

Just to be clear, Iā€™m not at all against defining sub-language that is statically compiled and possibly executable without runtime. My main comment has been clarifying the current status of Julia as a language, especially on what is (not) guaranteed. (But I personally like playing with ā€œdynamic but optimizable semanticsā€ and so I may be exaggerating this aspect.)

2 Likes

Thatā€™s a neat pattern! Yes, thatā€™s more or less hand-rolled monomorphic inline caching, where we assume that the call site will only ever need to deal with a single type. Wikipedia has a nice example of it. Itā€™s fairly readily possible to extend it to manual polymorphic inline caching with generated functions, too, I think.

I didnā€™t know the word ā€œmegamorphic.ā€ Sounds useful :slight_smile:

Yes, thatā€™s what @tisztamo did in the aforementioned Catwalk.jl (plus more fancy things like call frequency-based optimizations), IIRC.