I disagree with this. Always go for basic high performance code - its just too easy to do, and it will become habitual over time. It lets us ask much better questions and have faster feedback. Just don’t worry about the last 30% difference, that’s the mistake newbs need to be warned against.
I just don’t see the time and energy you are talking about in getting anything but the last dregs of performance. For example: once you understand array prealocation, you just define the arrays you need first, instead of in the loop. It’s no harder. This even relates to what we would previously think of as dark arts of performance. Once you get the limitations of functions that will run with GPUarrays, you can just write code for that and it just works, with very little effort.
Similarly always write relatively modular code (except for pure glue scripts). You will also learn to do it habitually and everything you write will be more useful to yourself and other people.