Using Zygote with custom CUDA kernels

oscarknagg · August 3, 2021, 1:02pm

Tullio seems like a good solution for what I’m doing as Einstein summation is flexible enough to express my use case. Also it seems like Tullio is pretty fast and works on GPU too, some timings for the minimal example in the OP:

julia> using CUDA, KernelAbstractions, CUDAKernels, Tullio

julia> x, y = CUDA.fill(1.0f0, 4), CUDA.fill(2.0f0, 4);

julia> tullio_mult(a, b) = @tullio out[i] := a[i] * b[i]
tullio_mult (generic function with 1 method)

julia> CUDA.@time CUDA.@sync tullio_mult(x, y);
  0.000394 seconds (353 CPU allocations: 8.219 KiB) (1 GPU allocation: 16 bytes, 2.92% gc time of which 69.57% spent allocating)

julia> CUDA.@time CUDA.@sync gpu_multiply(x, y);
  0.000316 seconds (355 CPU allocations: 5.891 KiB) (1 GPU allocation: 16 bytes, 5.63% gc time of which 80.34% spent allocating)

It also seems fast for larger examples too.

julia> x, y = CUDA.fill(1.0f0, 2^30), CUDA.fill(2.0f0, 2^30);

julia> CUDA.@time CUDA.@sync tullio_mult(x, y);
  0.195727 seconds (77.10 k CPU allocations: 1.179 MiB, 69.54% gc time) (1 GPU allocation: 4.000 GiB, 69.59% gc time of which 0.01% spent allocating)

julia> CUDA.@time CUDA.@sync gpu_multiply(x, y);
  0.185908 seconds (98.58 k CPU allocations: 1.505 MiB, 67.56% gc time) (1 GPU allocation: 4.000 GiB, 67.60% gc time of which 0.01% spent allocating)

Topic		Replies	Views
Zygote errors on simple operations with Complex CUDA Arrays GPU cuda , zygote	0	391	May 4, 2021
Help needed with Zygote Flux and custom adjoints on GPU Machine Learning question , gpu , cuda , zygote , adjoint	1	506	January 11, 2022
Help using CUDA, Zygote, and random numbers Machine Learning cuda , zygote	4	109	December 23, 2024
Zygote gradient error with `reduce` on GPU Machine Learning cuda , zygote	3	315	February 6, 2023
Zygote returning wrong gradient on GPU Machine Learning	4	857	May 10, 2022

Using Zygote with custom CUDA kernels

Related topics