GPUArrays, 64-32bit conversions, and Cassete.jl

I’m writing some frameworks that run arbitrary user code on the GPU, but I’m missing the performance gains of using only 32bit numbers.

Is anyone else thinking about automatic 32/64 bit code transformations using Cassette.jl? Or other options?

I’m not sure if it’s the recommended option, but there is https://github.com/stevengj/ChangePrecision.jl

Oh thanks I hadn’t seen that. But this might be an issue:

Code hidden inside external functions that are called is not affected.

That’s often the code I’ll need to change! Everything that will be compiled during GPUArrays broadcast needs to be converted first, unless I misunderstand the problem.

Are you interested in Float32`Float64` or in downconverting integer operations? For the former you just need to make sure that your input data is in the right format and that your code is type-stable.

That doesn’t help if you call some existing function that does e.g. a + 1.0

Yes, but that is not a GPU specific problem and not one that we can solve as compiler optimization.

Generic codes should never do that. a + one(eltype(a)) always works.

yeah, generic code should also never contain bugs, but i guess it happens more often than we like in the end :stuck_out_tongue:

i tried to write a simple cassette pass to always convert float64 to float32 if it encounters it in the arguments of a function call, but the code didnt infer nicely - hopefully just me being a novice cassette user :wink:

I want to convert to both Float32 and Int32. Lots of the operations are Int working on arrays of Int, and just passing in an array of Int32 doesn’t seem to work. Just the ram savings of Int32 arrays just working would make it worthwhile.

I personally would never do a + 1.0 but who knows what users will do! Convention and sticking to oneunit(x) etc is the best method but eventually just fixing it all with Cassette.jl might just work…

@sdanisch do you have a gist of that converter? I might have a go in the next few weeks.