Batched Matrix solve in CUDA.jl

Take a look at this discussion: Accelerate solving many matrix problems - #9 by clinton