Nothing directly to do with Julia, but I found it interesting in light of choices in Julia around C-interop, struct-layout, etc., as well as the conflict between CPU and GPU code generation. It does also mention LLVM a few times. I think some here might find it interesting.
âA programming language is low level when its programs require attention to the irrelevant.â I wondered recently, what you need to know about the underlying system to write julia code.
For example in https://github.com/JuliaLang/julia/issues/26938 there is âMore generally, the problem is that inexperienced Julia programmers allocate unnecessary little arrays in lots of ways, not just the one cited here, because theyâve been trained by other languages that âvectorization is always goodâ, and it will be difficult to get the compiler to recognize and unroll more than a tiny subset of these cases.â
Another example is: âproper looping in X-first orderâ.
Shouldnât a language for non-professional (researchers) programmers free you from that?
Thanks for sharing the link ! We go slowly toward SOACs (e.g. 2. The Futhark Language â Parallel Programming in Futhark) to program both GPUs and modern CPUs. The only problem with this trend is that adoption of new programming paradigms tend to be harder with age
Generating code for it from a functional-style map operation is trivial: the length of the mapped array is the degree of available parallelism.
Free you from what exactly? Having how you write the code impact the performance? That sounds nice, of course, but hardly very realistic.
I think the best you can hope for, and what julia aspires to, is to make that code that is intuitive fast. But if people have been trained in a different programming paradigm there isnât a lot you can do if you donât want to adopt that paradigm, I guess.
To write high-performance code in any language, you have to know something about how computers and compilers work and the performance characteristics of any functions you are calling. Julia is no different.
What Julia does is to make it possible to write high-performance code in a dynamic language while remaining high level and type-generic, often with relatively small tweaks.
A userâs first Julia programs, on the other hand, especially if they have habits from languages that work very differently, are likely to be slow. Especially if they donât have a correct mental model of how computers workâif you are a Python or Matlab programmer who doesnât understand why those languages are slow, it will be harder for you to make Julia code run fast.
Perhaps itâs time to stop trying to make C code fast and instead think about what programming models would look like on a processor designed to be fast.
That is interesting. Will we ever see a Julia specific processor?
The chance of that is like 0.0000000000000000001 %. I remember from the very beginning of Java (after it was made public) that java chips were made by SUN (taking inspiration from lisp machines from way earlier). These were basically SPARC chips which understood java byte code. This was in the era when java byte code was interpreted and there was no JIT, i.e. pre hotspot JVM so pre java 1.2. Java was slow then on standard CPUs like intel. This was not a success however and the chips were only produced for a short while. When the hotspot JVM with jitting became available the next year, it was game over for the chips. Jitting was just a much cheaper and better solution.
A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model. Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.
IBMâs recent POWER processors seem to follow some of these prescriptions?
4-8 threads per core, and a relatively short pipeline, and 4 or 8 vector units per core (1 per âsliceâ).
For what itâs worth, here are some benchmarks comparing a computer with this processor to a dual Intel Xeon and two AMD Epyc systems:
I donât know enough to recognize an obvious pattern in those numbers. It does sound like they tested a lower end Power9, and this C article of courses predicts âRunning C code on such a system would be problematicâ.
EDIT: checking online, I see a lot of criticisms for that benchmark. Apparently the x86_64 CPUs used cost far more money, making it not particularly informative.
Isnât Julia really C-like? That is, shouldnât we expect Julia to run faster on processors optimized for running C programs than those optimized for some other concept of speed?
Weâd have to really change how it is we structure programs. Normally, isnât most of our code written in a serial fashion?
Although, it seems like a compiler could build a dependency graph of computations, and figure out which are safe to do simultaneously, and therefore send the sequences down separate pipelines?
d = randn(500,1000);
a, b, c = 3.2, 5.4, 7.6
i = 6a + 7d[3,10]
j = 4b + 5d[95,988]
k = 2c + 3d[498,20]
w = exp(i) + d[4,10]
x = log(j) + d[96,988]
y = sin(k) + d[488,20]
z = w - x + y
For such a small example, youâd obviously want all the instructions to be on the same core, to avoid moving memory around. Would it really be advantageous to split them up among separate pipelines to the same computing units? Do reduce risk of waiting for one of the indices into the heap array to arrive?
So, am I right in thinking that either the compiler is going to have to be able to do a lot in adapting to a different paradigm, or we will have to write code quickly?
I donât know much about hardware or how compilers actually work, so Iâd be happy to learn more.
Interesting thought. Iâd also agree that julia code in execution is rather close to C, so no need to specialize on that. However in (typ.) julia programms the access to large ammounts of data - both in sequential form and in stride/random access - can be a limiting factor (LOAD - EXECTUTE - STORE pipeline) so design of memory hierarchies to enable seamless SIMD or MIMD might be worth it.
In the end we are iterating the old supercomputer definition here: A supercomputer is a device that transforms a computational bound problem into an IO bound problem âŚ