What should be the mental model of computers for new Julia devs/ software devs in general?

This post is thought-provoking, yet what this post entails is not trivial at all…

I understand that some courses make students see computers as single-cycle processors with no cache and single-cycle memory allocator. Learning about things like cache and SIMD can be helpful. However, modern CPUs are dragons.To have to reason with out of order executions, superscalar processors, and so on all the time could be an overkill.

What should be the reasonable mental model of a computer? One that’s not too complicated but accurate enough to program fast computers?

1 Like

I would say that spatial and temporal locality of data should be clearly understood. Basic physics principles that are here to stay.

4 Likes

This is the ultimate blog post about it:

What scientists must know about hardware to write fast code by @jakobnissen

16 Likes

Write code that is easy to refactor, profile and benchmark your code, optimize bottlenecks.

A modern computer is way too complicated for holding even a fraction of the low-level details in your head, or even learning about them. If you get within 50% of “theoretical” performance, you are fine. Reach for the low-hanging fruits.

New Julia users in particular should just strive to write type-stable code and ignore everything else, especially demonstrations of micro-optimizations on this forum. They have enough on their plate.

15 Likes

A computer is a fractally complex dragon:

  1. A modern CPU resembles more of a fractally complex hardware-accelerated interpreter / virtual machine of an ossified standardized highish-level bytecode language (assembly operating on the architectural state),
  2. that was generated from a different highish-level bytecode language in a fractally complex manner (e.g. from llvm bytecode with optimization and lots of potential undefined behavior)
  3. that was generated from / interacts with a high-level human-written language in a fractally complex way (e.g. julia or C or C++).

I would say that humility in the face of dragons is the most important thing: You never stop learning, especially since all of this is a moving target.

To give a concrete example: It’s OK if new people don’t know that or why linked lists are bad in most contexts. It’s not OK if new people “know” that linked lists are performant because they were taught from books that adhere to obsolete paradigms (linked lists were OK in the 90s when e.g. java was developed, but since then compute has scaled much faster than memory, especially wrt latency).

To teach that humility I’d look at a small handful of dragons in some level of detail.

The Spectre attack (as opposed to the very different meltdown, which is an intel-specific fixed CPU-bug) is pretty good for (1), since it teaches about cache and locality, as well as superscalar stuff, latency hiding, speculative execution and the fine distinction between machine state and architectural state.

For (2) and (3), the concept of undefined behavior / assumptions is often overlooked and quite teachable. For that. I’d recommend reading a little of LLVM langref and playing around with UB-triggered “miscompiles”.

Fun exercise: Write a julia or C function that ends with return x >= 0 ? x : 0 and happens to return a negative number in the REPL or in small C programs. Obviously you need to do bad stuff ™, involving pointers or integer overflows or race conditions or llvmcall, and this probably wont work with -O0.

Then meditate about how that reflects both the high level (the wording of the language spec for C/C++, or the code in julia that emits the broken assumptions into the llvm-IR) and intermediate representation (how llvm works with that, and the fiddling around to coax the optimizer into actually using the broken assumptions).

PS: the above was for people who want to do julia / C / C++. For people who want to do javascript stuff, you don’t need to look at llvm, but need to instead study internals of at least one JS engine (at least enough to understand where the dragons lie). For java people, you need to study JVM internals, here I recommend JVM Anatomy Quarks.

5 Likes

I stumbled upon this site recently, I think this is a pretty comprehensive yet high level introduction to the matter : https://cpu.land

2 Likes

Since I was asked via PM, some more words on spectre. I refered to Spectre, because it is a very illustrative attack (and also super fun and poetic):

  1. CPU speculates branches. The computations during “the speculation window” are called “transient” and are rolled back once the CPU figures out that it got the branch wrong. During speculation, the CPU may trigger memory loads that are fetched from main memory and then placed into cache, possibly evicting other entries.

  2. The contents of the cache (what is cached, what is evicted) are not part of the “architectural state”: You cannot ask the CPU about that. The spec / processor manual makes no promises. But it is “machine state”, and it impacts performance which can be measured (indeed, most of performance work involves reasoning about and massaging the non-architectural machine state).

  3. The state of the cache is not rolled back after speculation.

  4. There is a naive reason for not rolling back cache-state after speculation: This would be super annoying and expensive for CPU designers. There is a second more cynical reason for not rolling back cache-state: The memory loads caused by the mis-calculation are quite likely to be useful on the real branch. For example, you may have if(condition) foo() end; load(x) – then the same load happens either way, and if we’re latency bound on load(x) then the mis-speculation is essentially free.

This second more cynical reason for not rolling back cache state is why I believe it is incorrect to call “spectre” a bug or lazyness – it is an important tradeoff and will stay with us forever.

What do we all learn from this? Three things:

  1. The detailed behavior of code that does not run (architecturally) can impact performance and security. A purely speculative timeline can reach over and affect real performance. Like in a fantasy story.
  2. In order to plan the performance of your code, it is worthwhile to keep more than mere architectural state in mind. Plan for branch misses, latency hiding, superscalar weirdness!
  3. Computers are fractally complex dragons. Like in a fantasy story. Have some humility and don’t be too fast in believing any model “true”. There is more between heaven and earth than the processor manual mentions.
10 Likes

The cryptographic community already writes branchless code. Those designing the safety system should keep this in mind.

2 Likes

They do, and the techniques are fun to learn!

But we all should be humble enough to recognize that writing constant-time code (i.e. code that runs in the same time independently of secret inputs) is difficult and about more than just branches. Dragons everywhere.

To give a fun recent example to brighten your day: The amount of heat dissipated during an operation like e.g. an addition depends on the numbers of transistors that need to switch, which in turn depends on the actual values you’re adding. Dissipated heat determines dynamic frequency throttling (turbo mode). This is a fun timing sidechannel that is sometimes enough to steal private keys over the network, cf hertzbleed!

To inject more despair (abandon all hope, ye who believe that computers are truly understandable or controllable), rowhammer is still not fixed (all mitigations are paper-thin).

7 Likes