GPUArrays, 64-32bit conversions, and Cassete.jl

Raf · August 30, 2018, 8:11am

I’m writing some frameworks that run arbitrary user code on the GPU, but I’m missing the performance gains of using only 32bit numbers.

Is anyone else thinking about automatic 32/64 bit code transformations using Cassette.jl? Or other options?

piever · August 30, 2018, 8:55am

I’m not sure if it’s the recommended option, but there is https://github.com/stevengj/ChangePrecision.jl

Raf · August 30, 2018, 9:01am

Oh thanks I hadn’t seen that. But this might be an issue:

Code hidden inside external functions that are called is not affected.

That’s often the code I’ll need to change! Everything that will be compiled during GPUArrays broadcast needs to be converted first, unless I misunderstand the problem.

vchuravy · August 30, 2018, 2:30pm

Are you interested in Float32`Float64` or in downconverting integer operations? For the former you just need to make sure that your input data is in the right format and that your code is type-stable.

sdanisch · August 30, 2018, 2:40pm

That doesn’t help if you call some existing function that does e.g. a + 1.0

vchuravy · August 30, 2018, 2:47pm

Yes, but that is not a GPU specific problem and not one that we can solve as compiler optimization.

ChrisRackauckas · August 30, 2018, 2:49pm

Generic codes should never do that. a + one(eltype(a)) always works.

sdanisch · August 30, 2018, 3:03pm

yeah, generic code should also never contain bugs, but i guess it happens more often than we like in the end

i tried to write a simple cassette pass to always convert float64 to float32 if it encounters it in the arguments of a function call, but the code didnt infer nicely - hopefully just me being a novice cassette user

Raf · August 30, 2018, 3:06pm

I want to convert to both Float32 and Int32. Lots of the operations are Int working on arrays of Int, and just passing in an array of Int32 doesn’t seem to work. Just the ram savings of Int32 arrays just working would make it worthwhile.

I personally would never do a + 1.0 but who knows what users will do! Convention and sticking to oneunit(x) etc is the best method but eventually just fixing it all with Cassette.jl might just work…

@sdanisch do you have a gist of that converter? I might have a go in the next few weeks.

Topic		Replies	Views
How to cast function input parameters General Usage type , convert	10	1328	November 23, 2021
Best way to deal with broadcasting of intrinsics on CuArrays? General Usage	4	963	February 7, 2019
How to round off a Float32 to Int32 on GPU GPU question	4	1422	April 16, 2019
Float32 function return and Cuda General Usage question , data , cuda , type	3	560	June 20, 2019
Issue with 1.5 month old code based on CuArrays General Usage gpu	3	804	February 6, 2020

GPUArrays, 64-32bit conversions, and Cassete.jl

Related topics