How can I register/align shifted image stacks?

I currently have Julia running in a VM. That’s probably the main reason for the slow performance.

Might be, might not. RegisterQD uses QuadDIRECT to do global optimization. That may account for the better results. But it’s also much more expensive than local optimization. Moreover, there is no good criterion for termination, short of being able to prove there isn’t a better minimum (which can require quite exhaustive exploration). If you need to speed it up, termination would be the first thing I’d look at. But it sounds like it’s good enough as-is for your needs.

Regarding your questions, yes to both.