CUDA.jl 2.0: Per-thread streams, Float16, CUSPARSE clean-up

maleadt · October 2, 2020, 7:45am

Hi all,

I’ve just tagged and release CUDA.jl 2.0, with several new features: https://juliagpu.org/2020-10-02-cuda_2.0/

This release is slightly breaking because of the following changes:

per-thread streams: unlikely to break anything since few people are using threads with CUDA.jl
CUSPARSE clean-up: for example, switch2XXX methods are now convert methods
array dispatch changes: view/reinterpret/reshape are now represented using Base’s wrappers.

This last point isn’t technically breaking, but it’s likely that some methods that still dispatch on ::CuArray won’t get considered anymore now that, e.g., view(...) = ::SubArray{<:CuArray}. As a result, fall-back Base methods might get used instead of CUDA-specific implementations, triggering scalar iteration or invalid pointer conversions (GPU array to CPU pointer). The fix is to use DenseCuArray (if your method needs a CuPtr), StridedCuArray (for a CuPtr + strides) or AnyCuArray (for anything that can be used in a kernel). Please file issues if you encounter this with array operations from CUDA.jl or GPUArrays.jl.

Finally, since this is a breaking release, dependent packages like Flux.jl still needs to be updated or bumped so many users won’t be able to install CUDA.jl 2.0 just yet.

Palli · October 2, 2020, 4:11pm

initial support for Float16 [here assuming Julia’s type] CUBLAS wrappers can be used with (B)Float16 inputs […]
julia> using BFloat16s
[…] Alternatively, CUBLAS can be configured to automatically down-cast 32-bit inputs to Float16.

How does this compare to CUDA from other languages, C++, or indirectly from Python?

My understanding is that Nvidia has this new type, and makes life easy for C++ programmers, that otherwise would have needed to replace lots of types.

Since you can downcast from Float32, I assume also from Float64 (just not directly, maybe not a big worry as nobody uses anyway… for ANNs)?

Python libraries, e.g. PyTorch has had some advantage over Julia and pure Julia packages, is this likely to close the gap (when Flux etc. support this updated wrapper). [You can also use PyTorch.jl/ThArrays.jl and bypass this wrapper. Is that something you would mix and match with the wrapper?]

maleadt · October 2, 2020, 4:46pm

That’s the global mode mentioned on the blogpost that lets CUBLAS downcast, there’s not really a special type, at least not for the user. Automatic downcast from Float64 isn’t currently supported by CUBLAS now though.

Topic		Replies	Views
Release: CUDAnative/CUDAdrv 1.0; CuArrays 0.9 Package Announcements	0	466	January 16, 2019
Release: CUDAdrv/CUDAnative 2.0, CuArrays 1.0 Package Announcements gpu	0	898	March 22, 2019
PSA: CUDA.jl replacing CuArrays.jl/CUDAnative.jl/CUDAdrv.jl/CUDAapi.jl -- call for testing! Package Announcements gpu , cuda	21	3419	January 21, 2021
Julia 1.9.0-DEV.239 with CUDA CuArray GPU	12	919	March 28, 2022
ANN: CUDAnative 3.0 and CuArrays 2.0 Package Announcements	3	853	March 29, 2020

CUDA.jl 2.0: Per-thread streams, Float16, CUSPARSE clean-up

Related topics