Intel added vector instructions to do conversions to/from 16-bit floats many years ago, and in fact, showed that (because of using half the memory, better cache utilization) that using 16-bit could be faster than 32-bit, for larger operations, and not that much slower for smaller vectors.
https://software.intel.com/en-us/articles/performance-benefits-of-half-precision-floats
It seems that making sure that Julia can use the SIMD instructions when doing vector operations on 16-bit floats could acheive some nice performance benefits.