Can Julia achieve fine grained control of performance without sacrificing ease of use?

Julia was designed to have a good overall performance, being llvm-compiled with a good type inference.
However, Julia has some subtle issues when fine-grained control over the inner working is needed, for example…
-What if you use a data that can either be a pointer or not a pointer? (Using some bits to indicate whether it’s a pointer or not.)
-What if runtime dynamic dispatch really is needed?
-What about fixed-size mutable arrays?
-What about mutable objects living in the stack?
-Memory models needed to use atomic ops for lock-free data structures?
-Etc
Some of them are being worked on.
The point is that, Julia is fast in general, but there are still subtle features needed to implement certain things with maximum performance.
This implies that even if hypothetically, Julia could be as fast as C++ for code with the same semantic, Julia would still lose to C++ because it misses out on opportunities to optimize the code.
Do you think it would be solved in the future?
Or would it be too much to ask? Even C/C++ often need to refer to assembly sometimes. Julia can do that too if needed, but it would be better if Julia has features for fine-grained control over performance, but can it do that without sacrificing ease of use?

1 Like

I’ve come across some of these, atomics works reasonably well now, I think. The dynamic dispatch could be improved, at least the documentation. The canonical example is a vector of geometric objects, e.g. circle, triangle, square, etc. Or, in the case I came across, a vector of transformations. Each of them a subtype of an AbstractTransformation.

I’m not sure about the internals now, but at the time of compiling a function taking such an argument, there exists a fixed number of subtypes, so it is possible to enumerate them and have a simple and fast lookup for dispatch on the elements of the vector. If there are more containers of abstract type it could be complicated. Likewise with complicated parametric types. Perhaps it’s possible to annotate arguments that should be treated in this way? It could of course lead to problems if further subtypes are created later, with invalidations of such functions?

For what I know, something like this is already being done, but it is not documented in the performance tips.

Simultaneously, this is a bit too broad and you have too many different examples, but broad strokes, a high-level language like Julia will never have the same level of control as C++. Ease of use and safety comes from leaving things to a compiler and garbage collector, there’s far fewer user mistakes that can get in the way.

Mutable objects on the stack is a good example of something that is very hard to do safely, and the compiler does it as well as most people. Bear in mind that mutable here means data shared by multiple variables, not just changing data at a location in the stackframe, which can be done with variable reassignments to immutable instances.

As for the features you listed, you don’t really need close-to-the-metal control to pull some of these off:

  • Not really sure what data that “can either be a pointer or not a pointer” means exactly, I’ve never heard of a data structure like that, but if you mean holding data inline a struct at one field versus a Ref at another, yeah you can make structs like that in Julia, I’ve seen it done to implement sometimes-inline arrays.
  • You can cause runtime dispatch pretty easily with type instability. If you can do static dispatch I see no reason to opt into runtime dispatch on purpose.
  • StaticArrays.MArray is a fixed-size mutable array
1 Like

That is not true - you can do pointer shenanigans with Ptr and bit fiddling in Julia just as well as in any other “low level” language. You just don’t get any support from the Julia runtime for managing that and you’re on your own - but then again, neither does C, and some perceive the added functionality/safety that C++ provides as “bloat” (which I disagre with).

I do that often, and I don’t find it more cumbersome than inline assembly in C.

Do you have some concrete example you’re thinking of?

6 Likes

What I meant is that it would be better if it can implement some low-level stuffs without referring to assembly every time. Some examples would be an array whose size is known at run-time, not compile-time, but does not change after it has been constructed. Having this primitive array type provides a basis for many data structures without needing mechanisms to expand the array and so on.

There’s some low-level features interfacing with C, but it’s not all features, like mutables on the stack or static variables.

Every Array except a Vector is like this.

How do you use assembly in Julia? :thinking: (Honest question.)

Frequently enough that it’s very convenient to abstract away and not have to think about it :slight_smile: While the inline assembly in inline LLVM-IR that I use is a bit rarer than just inline LLVM-IR, if you use SIMD.jl, pretty much every use of that uses inline LLVM-IR.

The general pattern is something like

llvmcall("""
call void asm sideeffect "<asm goes here>", ""()
ret void
""",
Nothing,
argtypetuple...,
args...)

i.e., using llvmcall to have LLVM-IR, which then does the actual assembly call inline in its IR.

I really wouldn’t recommend doing that outside of very narrow circumstances (say, writing a “disable interrupts” function for specific use on AVR microcontrollers) because the assembly is (obviously) architecture specific and not portable at all. Still, it’s very possible to do :person_shrugging:

6 Likes

llvmcall is the easiest way. For most cases llvm ir is enough, but from llvm ir you can do inline assembly as well.

1 Like

I’m still not quite sure I follow - nothing you’ve mentioned requires writing assembly at all. Do you mean specializing the resulting code on the size of that array…?

1 Like

This will be always true to some level, not only in Julia. You can find many (exhaustive) performance comparisons of Julia with other languages (and among other languages) in which the differences in performance end up being at the level of specific compiler flags, which intrinsic math function is being called, and so on. Generally these things are only required in very localized portions of the code, and the overall facility to implement good algorithms is much more important for the performance as a whole.

Where I think Julia seems to be somewhat slower than C++ (specifically) is when dynamic dispatch is required. From what I’ve seen here trying to match C++ performance in this case can be cumbersome, and I don’t remember having a standard go-to solution.

Large stack allocated arrays, for example, would be (will be?) a nice addition to the language, but one can get over that rather easily with preallocation.

There is a recent proposal from Jeff to add a memory buffer type which can be used for defining array types in Julia. I don’t have a link handy.

1 Like
6 Likes

You use Julia on AVR micros!? (or was that a hypothetical example?) I’m impressed, Julia seems sort of large to run on something like that - but then again my only real experience with AVR is Arduino.

1 Like

Well, not all of Julia - the whole compiler, runtime & task scheduling obviously doesn’t fit on microcontrollers :joy: It’s just a small subset and there are lots of difficulties, which will be explored later this year in a juliacon talk :slight_smile:

8 Likes