This is the ultimate blog post about it:
What scientists must know about hardware to write fast code by @jakobnissen