Most efficient way to find cholesky decomposition of slices of a 3D array in KernelAbstractions

Jarrod-Angove · August 16, 2024, 8:33pm

I am using KernelAbstractions.jl to write a machine learning package that is device-agnostic. At one point in my algorithm, I end up with a NxNxn tensor, let’s call it T, on the device. I need to create a new vector, let’s call it L, that contains the lower triangular component of the cholesky decomposition of the slice i: L[i] = cholesky(T[:,:,i]).L.data. Is there a way to implement this in KernelAbstractions.jl that doesn’t involve me implementing my own cholesky kernel? For clarity, here is a sequential example of the full problem:

logp = Vector{Float32}(undef, n)
for k in 1:n
    L = cholesky(T[:,:,k])
    a = L.U \ (L.L \ y)
    logp[k] = -0.5 * y' * a - tr(L.L) - N/2 log(2pi)
end
return logp

Thank you!

amontoison · August 17, 2024, 6:06pm

We have a dense method cholesky for NVIDIA and AMD arrays.
It should work if you just rely on the multiple dispatch of cholesky and ldiv!.

biona001 · August 17, 2024, 8:55pm

Btw, last time I checked L.L creates a copy. You can avoid this by UpperTriangular(L.factors)' (if L.uplo=='U') or LowerTriangular(L.factors) (if L.uplo==‘L’)

Jarrod-Angove · August 21, 2024, 11:00pm

Sorry, I should have made my question a little more clear. I have already been running my code on GPU using multiple dispatch, I’m just concerned about the latency from spinning up n kernels. In my case, n is on the order of 10,000, while N is only on the order of 100, so I’m worried that launching the kernels is creating a lot of overhead. I guess the better question would have been this:

Is there some way to fuse the kernels for all n cholesky calls? In the KernelAbstractions.jl docs it just has an empty section for fusing kernels, so I was wondering if there was some undocumented way to go about this.

Thanks for your help!

Jarrod-Angove · August 21, 2024, 11:01pm

This is a good suggestion, thanks! I’ll check the performance improvement on this

Topic		Replies	Views
Cholesky simple example General Usage linearalgebra	11	2494	October 10, 2021
Sparse LU factorization on GPU GPU linearalgebra , factorization	12	498	November 2, 2024
Implementation for accessing Cholesky factors Numerics linearalgebra	2	545	March 2, 2021
Gradient for Cholesky decomposition with CuArrays with Zygote.jl General Usage question , linearalgebra , zygote , autodiff	0	304	May 15, 2023
Extracting Lower Triangular Factor of Sparse Cholesky General Usage linearalgebra	4	1157	October 7, 2019

Most efficient way to find cholesky decomposition of slices of a 3D array in KernelAbstractions

Related topics