Speed up very elemental functions

You can also try @tturbo or @turbo:

@tturbo out .= energies_spin.(...

It seems to be a bit slower than vmap/vmapntt! for some reason. But still dramatic speedup (20-30x on my 6-core machine.)