Nothing directly to do with Julia, but I found it interesting in light of choices in Julia around C-interop, struct-layout, etc., as well as the conflict between CPU and GPU code generation. It does also mention LLVM a few times. I think some here might find it interesting.
“A programming language is low level when its programs require attention to the irrelevant.” I wondered recently, what you need to know about the underlying system to write julia code.
For example in https://github.com/JuliaLang/julia/issues/26938 there is “More generally, the problem is that inexperienced Julia programmers allocate unnecessary little arrays in lots of ways, not just the one cited here, because they’ve been trained by other languages that “vectorization is always good”, and it will be difficult to get the compiler to recognize and unroll more than a tiny subset of these cases.”
Another example is: ‘proper looping in X-first order’.
Shouldn’t a language for non-professional (researchers) programmers free you from that?
Free you from what exactly? Having how you write the code impact the performance? That sounds nice, of course, but hardly very realistic.
I think the best you can hope for, and what julia aspires to, is to make that code that is intuitive fast. But if people have been trained in a different programming paradigm there isn’t a lot you can do if you don’t want to adopt that paradigm, I guess.
To write high-performance code in any language, you have to know something about how computers and compilers work and the performance characteristics of any functions you are calling. Julia is no different.
What Julia does is to make it possible to write high-performance code in a dynamic language while remaining high level and type-generic, often with relatively small tweaks.
A user’s first Julia programs, on the other hand, especially if they have habits from languages that work very differently, are likely to be slow. Especially if they don’t have a correct mental model of how computers work—if you are a Python or Matlab programmer who doesn’t understand why those languages are slow, it will be harder for you to make Julia code run fast.
The chance of that is like 0.0000000000000000001 %. I remember from the very beginning of Java (after it was made public) that java chips were made by SUN (taking inspiration from lisp machines from way earlier). These were basically SPARC chips which understood java byte code. This was in the era when java byte code was interpreted and there was no JIT, i.e. pre hotspot JVM so pre java 1.2. Java was slow then on standard CPUs like intel. This was not a success however and the chips were only produced for a short while. When the hotspot JVM with jitting became available the next year, it was game over for the chips. Jitting was just a much cheaper and better solution.
A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model. Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.
IBM’s recent POWER processors seem to follow some of these prescriptions?
4-8 threads per core, and a relatively short pipeline, and 4 or 8 vector units per core (1 per “slice”).
For what it’s worth, here are some benchmarks comparing a computer with this processor to a dual Intel Xeon and two AMD Epyc systems: https://www.phoronix.com/scan.php?page=article&item=power9-epyc-xeon&num=1
I don’t know enough to recognize an obvious pattern in those numbers. It does sound like they tested a lower end Power9, and this C article of courses predicts “Running C code on such a system would be problematic”.
EDIT: checking online, I see a lot of criticisms for that benchmark. Apparently the x86_64 CPUs used cost far more money, making it not particularly informative.
Isn’t Julia really C-like? That is, shouldn’t we expect Julia to run faster on processors optimized for running C programs than those optimized for some other concept of speed?
We’d have to really change how it is we structure programs. Normally, isn’t most of our code written in a serial fashion?
Although, it seems like a compiler could build a dependency graph of computations, and figure out which are safe to do simultaneously, and therefore send the sequences down separate pipelines?
d = randn(500,1000);
a, b, c = 3.2, 5.4, 7.6
i = 6a + 7d[3,10]
j = 4b + 5d[95,988]
k = 2c + 3d[498,20]
w = exp(i) + d[4,10]
x = log(j) + d[96,988]
y = sin(k) + d[488,20]
z = w - x + y
For such a small example, you’d obviously want all the instructions to be on the same core, to avoid moving memory around. Would it really be advantageous to split them up among separate pipelines to the same computing units? Do reduce risk of waiting for one of the indices into the heap array to arrive?
So, am I right in thinking that either the compiler is going to have to be able to do a lot in adapting to a different paradigm, or we will have to write code quickly?
I don’t know much about hardware or how compilers actually work, so I’d be happy to learn more.
Interesting thought. I’d also agree that julia code in execution is rather close to C, so no need to specialize on that. However in (typ.) julia programms the access to large ammounts of data - both in sequential form and in stride/random access - can be a limiting factor (LOAD - EXECTUTE - STORE pipeline) so design of memory hierarchies to enable seamless SIMD or MIMD might be worth it.
In the end we are iterating the old supercomputer definition here: A supercomputer is a device that transforms a computational bound problem into an IO bound problem …