Modular multiplication without overflow

And the magic numbers need to be upgraded also:

magic1 = (UInt128(0xFFFFFFFFFFFFFFFF) << 64)
magic2 = UInt128(0x80000000000000000000000000000000)

Testing with the (a,b,m) given above with 128 round loop gives the right answer.