Quality of Julia code and speed - is this stressed enough?

This post is result of my reflection of trying learn Julia from few available resources, I hope that it will have some value.

My only problem with learning materials is that they too often omit problem of relation of quality of code and speed of executable (bad code is slow, good is mostly fast). To make something clear, I mostly learn Julia from January to September 2018 and go back to learning few weeks ago (April 2019) as such I only glimpse over JuliaAcademy courses.

Example from JuliaComputing/JuliaBoxTutorials on my computer show that Julia my_sum function written in the same way as in Python (this is the language that I know best) is 3x faster that Python code, and 30x slower that C (measured by minimal time). This is not true C-comparable speed. If someone judge speed of language just by examples like this, he/she can think that Julia speed is just empty marketing device (maybe not so bad) and this make very bad impression of language and it community for both beginners programmers and professionals (I encounter such situations).

To be true example with @smid on my computer is faster than C with -ffast-math flag, but to me it sub-optimal situation. As a whole I judge this introduction as very good and handling question of speed and quality of code is only real problem.

Here is list of problems that I see with it.

  1. Macro @simd looks to me as if it fall from the sky and in all tutorials I saw it is left unexplained.
  2. By description in Julia 1.x @simd is experimental, can vanish in further and incorrect usage can lead to unexpected results. I think any beginner programmer after reading something like that should avoid it at any cost, which left such person with open question, how make his/her code fast?
  3. In this notebook there is no mention of things like type stability or used non-abstract types arrays to make your code fast in core of introduction. By core I meant content of numbered notebooks, which were the only one discussed in every wide that I saw. These is discussed to some extent in notebook Exploring_benchmarking_and_performance.ipynb, but it is unnumbered and as such hidden among dozen of others in this director. I found it only few days ago, when I decided to work through all notebooks there.

Before conclusion I also want to mention to more things. Many introductions on YouTube had a very good practice of discussing of bad slow code and fast good one, but these are too old now in the term of language (mostly version 0.3, 0.4 I think), as such unsuitable for beginners.

Also I find on YouTube Julia channel only one video that topic is “High Performance Computing” in general, Arch D. Robison, “Introduction to Writing High Performance Julia (Workshop)”. Where workshop of mister Robinson is very good, changes in language from 2016 also made working through it quite a tricky thing.

As conclusion, I have few proposition, maybe not all are wrongheaded.

  1. Make notebook Exploring_benchmarking_and_performance.ipynb part on core introduction notebooks.
  2. Make stressing relation of good quality of code much more visible topic in outreach attempts. Personally I think devoting one slide of every presentation of Julia language just, about that is good thing (I make this myself in my last talk).
  3. It will be good thing to put on YouTube one up-to-date video on high performance computing can be good thing, since it have much wider reach that JuliaAcademy page.

I apologized for my English and every erroneous thing in this post.

7 Likes

I think the best resource is the Performance Tips section in the manual.

IMO in general, the manual is the best way to learn the language because it is kept up to date.

Julia progressed quite fast (and I hope that this will continue), and a natural side-effect is that examples and tutorials may become obsolete. This occasionally requires minimal effort (some of which can be automated), but occasionally a larger one.

Reading and understanding code from experienced Julia users (in packages) is also a great way to learn, as is following discussions here.

That said, I think that chasing the last 2x improvement by micro-optimizing should be done cautiously, since, as you have noticed, it may depend on language elements that can change quickly.

Thanks for taking the time to write up your impressions, hope that you find Julia useful.

7 Likes

Thanks for the feedback! That notebook could definitely use quite a bit more prose explaining what is going on there.

I really like that “Julia is Fast” notebook, but it could also benefit from a bit of a redesign — and I have been planning on doing so for quite a while. I’ve been torn on how to approach that simply because it could go in so many directions. As it stands, the “as fast as C” story gets muddled with lots of other topics: interoperability, type-stability, benchmarking, SIMD, best practices, floating point behaviors, hand-writing vs. library calls, and more. At the same time, I rather like that it presents a broad overview of these topics and allows for other notebooks (like the benchmarking one) to explore each in more depth. And I’ve seen others use it as a launching point for deeper exploration into many of those sub-topics.

It’s a challenging topic to present in an introductory notebook because some things (like SIMD) are relatively advanced topics but required to match best-line performance. You probably wouldn’t see the --ffast-math flag appear in a C tutorial, for example, but since we’re comparing against finely tuned libraries like numpy we want to demonstrate how to get those sorts of performances.

How are you writing my_sum? Just running that notebook myself yields:

Julia hand-written simd.....6.7
Julia built-in..............6.9
Python numpy................7.0
C -ffast-math...............8.8
Julia hand-written.........14.1
C..........................15.0
Python built-in...........921.0
Python hand-written......1113.2

Even if I do the type-unstable thing start with a s = 0 and omit @simd, I’m still at 14.5ms. This is still just as fast as C without the --ffast-math annotations… and I’ll note that the --ffast-math flag comes with many of the same caveats as Julia’s @simd (excepting the experimental marking, but even though it’s marked as experimental it won’t be going anywhere). It’d be helpful to know what you’re trying.

13 Likes

Not sure what you’re talking about — in that example, which is based on this more-detailed notebook from my lecture notes, the naive Julia mysum implementation (modulo a little care about type stability) should be about the same speed as C (or even faster if you use @simd), not 30x slower(!?!), and much faster than a naive Python implementation or even the built-in Python sum on a built-in list.

The purpose of my original notebook on the sum function was not about how to optimize complicated software, but rather to teach (using the simplest possible example) how different performance characteristics arise from different language and library semantics — the impact of semantics that require “boxed” and/or immutable values, of container types (like Python list or Vector{Any}) that can hold any element type, whether there is a tension between performance and type-generic code, and so on.

12 Likes

I agree with OP. I’ve been using Julia since 0.5 and it is my primary go-to for exploratory analysis and large scale simulations. However, it seems that every time I code in a simulation (say an agent-based model), it’s not as fast as it can be. Indeed, by posting on this forum (or using Slack) I find many inefficiencies and areas where my code can be made faster using intrinsic tricks I do not know. This is evident in many of the forum posts here. I see experienced Julians point out hard-to-understand one liners or using a macro to reach C speeds. (I have started to bookmark and save these topics for my own learning).

I have recommended Julia to a few people in my department, all of whom have had trouble reaching C speeds. This is largely due to not having good computer science background, not understanding type stability, and in general bad programming habits because of MATLAB or R.

In general, it seems that the complexity of the language has gotten to a point where you’ll not get optimized code unless you 1) understand computer science principles and 2) read the performance tips.

I wonder how to change this. I think one way is to make Julia strongly typed. If users are forced to provide a type for every variable, a lot of folks will instantly find “C speeds” in their first attempt in coding simple problems. However, I am sure the designers already considered it. Was it not a good idea when Julia was just starting out?

1 Like

I for one would never have reached C speeds, because there’s zero chance i would have started using julia. I barely knew what types were, and method errors referring to types were complicated enough and almost ended my interest.

3 Likes

The tradeoff here is that if you want to write code that is both high-level and fast, especially if it is type-generic (i.e. can handle many different array types, number types, etcetera), then you need to know something about how computers and compilers work and have some vaguely accurate picture of how a Julia operation or type is represented under the hood.

Basically, if you don’t understand why your Python or Matlab code was slow, probably the Julia code you write will be slow too.

Of course, this is hardly unique to Julia — even in C there is often a factor of 10–100 in speed between high-performance code by an expert programmer and naive implementations of the “same” algorithm. (Try to write a matrix-multiplication routine that competes with a fast BLAS sometime.)

If users are forced to provide a type for every variable.

Then you lose the advantage of a dynamic language and the (huge) advantage of being able to write type-generic code. Even for the “trivial” example of sum, it’s pretty awesome that a single simple implementation can reach C-like speeds for Vector{Float64} and still work (and be fast!) for arbitrary user-defined container types and arbitrary number-like types (anything that has + and zero).

The good news is that (a) when it comes time to optimize your code, it is not too hard to read the performance tips and do some localized tweaks — much easier than throwing your code in the trash and rewriting it in C — and (b) competent programmers can write fast, composable, powerful type-generic libraries in Julia that would be much harder to provide in any other language.

15 Likes

Exactly. I suppose it dosn’t help that I advertise Julia as “write matlab code, get C speeds” :smiley:

I don’t think I agree with the first one — the principles that need to be understood for efficient Julia barely scratch the surface of CS, and are very basic (eg memory access patterns). And they also need to be understood for any language if you want fast code.

As for the second point, I see no problem with reading the performance tips if you want performance.

The bottom line is that there is no programming language at the moment in which you get optimally performant code automatically. Julia is no exception, it just makes it much easier.

The implicit promise in “Julia is fast” is not that “anything you write will be fast automatically out of the box”, rather that you can (1) make it fast with relatively little effort, and (2) write generic code, so you or others can reuse the results of this effort later on.

13 Likes

I think the main problem here is, that Matlab is the hardest language to port to Julia and get a speed up.
That is because Matlab code is written in a way, that is largely at odds with writing performant code in any other language - but Matlab with their huge money bag found a way to make those crazy anti pattern fast.

The main problem for Julia here is, that there is only a small benefit in catering to such people: Julia works perfectly fine and is fast, if someone hasn’t been tainted by Matlab’s anti patterns… And Julia doesn’t have the same resources to just throw money at the problem. So if you switch from a 2000$ product to an open source Language like Julia, you will need to be prepared to relearn how to structure your code.

All other problems when trying to write code with optimal performance seem common to any other language.

The good news is, that the main bottleneck coming from Matlab to Julia is the array allocator performance - which might get huge improvements in the future to improve our ML use cases - which will also help to make Matlab style code fast :slight_smile:

I’m convinced, that this is not true. The only performance improvements one could get out of that is for cases, that would otherwise be type unstable.
The thing is, those are the easiest to fix, there are lots of tools to spot them (@code_warntype, Traceur) and Julia >= 1.0 got huge optimizations lately, so they aren’t even that slow anymore.
To force everyone to annotate their variables would completely change the language - doing that just for a couple of people to not make an easy to spot beginner mistake seems … intense! Especially, since you would loose a lot of Julia’s usability, so that you basically immediately loose all those relaxed users, that “simply want fast and easy code”.

9 Likes

Maybe you can explain what ‘C speeds’ means in this context? Are you saying that the Julia code they write is much slower than the C code that they write, meaning that simple, straightforward C code is much faster than simple, straightforward Julia code? Or does ‘C speeds’ mean ‘performance of professionally written and hand-optimized C code’?

3 Likes

Ehm, just because you don’t understand it, it’s not an anti pattern …
(Or as i learned in university: A real programmer can write FORTRAN programs in any language).
There is a reason, why Matlab has the name Matlab and there are rumors it’s connected to ‘Lab’ and ‘Matrix’.
I (tend to) have an engineering background and surprisingly long list of problems and solutions are matrix or matrix/vector problems and it’s quite nice to have a language available that supports something close to the math notation, blazingly fast library functions and great support for implicit loops, logical indexing and other magic to write short and consistent and expressive code.

Yes, i’m aware that people try to do strange things with matlab using it as a general programming language and get caught in the ‘vectorise this to make it fast’ domain.

3 Likes

The anti-pattern is that in Matlab this is the idiomatic and fast way to program for large arrays, x and y:

sum(sqrt(x.^2 + y.^2) > 1)

This creates 5 temporary arrays until you finally sum over the last one. It’s pretty crazy that they are able to make this fast, because it’s clearly suboptimal. The right way would be to take an element from each of x and y, operate on them, and then accumulate, creating zero temporary arrays.

The reason this is idiomatic Matlab code is that they batch operations to make them fast.

Given Matlab’s improved JIT I wonder if maybe a straight loop would also be pretty fast these days, but the vectorized approach is what everyone knows and uses for now, and it’s what some of them translate into Julia code, and expect to be fast.

In Julia you would write a loop, or pass an anonymous function to sum.

6 Likes

OK, so how do you know? Where in matlab is this visible?

Well, you can just look at the code and see. The temporaries are right there in the code. Or do you mean perhaps they are able to cleverly reuse some of the memory? Sure, they might have some magic up their sleeves.

The way to confirm it would be to use the profiler and enable memory profiling. But I know from experience that this style of programming is very memory-intensive, and can bog down a program completely.

But even if they are able to optimize away some of the allocations, it’s still an anti-pattern, because you cannot depend on an optimizer to be able to fix this for you. And if you move this pattern to a different language with a different compiler, you get bad performance, which was the point.

6 Likes

Hmm?

I don’t know what code you ran here on what data. But as I said, this relies on Matlab being very clever at optimizing your code (this is pretty much what Matlab is all about). It’s really hard to predict when these optimizations are going to work, and when they will fail.

As a pattern, this is going to turn bad when you move it to another language, such as, for example, Julia.

This is relevant to this quote, which you reacted to:

3 Likes

I’m not sure what miss understanding led to your comment - but we seem to be talking about the same thing, albeit it sounds like you’re trying to correct me.

I guess it happened when I talked about anti-patterns, without clarifying that those are anti-patterns for executing a language on a CPU with max performance. I haven’t said anywhere, that it’s an anti pattern for writing math down beautifully.

To quote you:

And that’s exactly why Matlab made this use case super fast - which my whole point builds upon: it’s simply hard to compete with Matlab in that domain, especially if you don’t charge 2000$ per license, have a huge compiler team - and actually have a fast alternative way to write that code, which takes the pressure away to sink millions into compiler optimizations.

edit:
Maybe this also arose, because I sounded very dismissive of those “anti patterns”, while I actually have quite some respect for Matlab making these wasteful (from a compiler viewpoint) but useful patterns fast.

2 Likes

x and y were rand(40000000,1) and f11

function a = f11(x,y)
a = sum(sqrt(x.^2 + y.^2) > 1);
end

But i’m confused now, you claimed that from your experience that this is memory-intensive, needs a special optimizer and the memory impact can only be read from the profiler.

1 Like

Yes, I have experience writing a lot of highly vectorized Matlab code, which uses a lot of memory, also as shown by the profiler. I will not claim any particular experience with this particular code snippet, which I made up on the spot, and which apparently is optimized, but with this style of programming.

2 Likes