Do you have special hardware for this? NVIDIA Tesla GPUs? Most GPUs are crippled for double precision, so you should only use that if you have the correct hardware. In most cases, consumer GPUs have 32x slower double precision than single precision. GPU memory does matter a lot, but this throughput difference is more likely to be the problem.
What kind of financial models? Optimization problems? Machine learning? Those are very robust to using single precision. SDE models? Those can use single precision, but in many cases do better with double precision.
Not necessarily. Many problems can be handled exclusively by GPUArrays.jl
Theory can happen is a long way from “works great!”
See the CUDArt.jl README.