CUDAnative support for Float16

By writing GPU compatible code.

Please show a full MWE and all output. Please read: make it easier to help you