Some time ago I did some experiments in this respect:
On a laptop quite probably can hit the memory bandwith limit. Multicore servers may have two or four or more (?) lanes to memory, depending on the task, the speedup there could be considerably larger.