By the way, I’m by no means an expert on CPU hardware and have very little understanding of how the coherence protocol actually works. In particular, I don’t know how on earth certain false sharing does not happen in Intel and IBM CPUs. So, I appreciate if the experts comment on/fact check the false sharing part.
(Of course, comments on other points are also welcome too )