Calculate associated Legendre polynomials on the GPU

Hi everyone,

I would like to evaluate the associated Legendre polynomials P_l^m (\cos{\theta}) on the GPU. I tried the packages AssociatedLegendrePolynomials.jl and LegendrePolynomials.jl, both of which work as intended for normal arrays but fail for CuArrays. The errors are different in each case and difficult for me to understand. I am not sure if GPU compatibility is not provided or if I am doing something wrong myself. Is there a way to make this work using these packages or some different approach?

Thanks for the help. Below is a MWE.

using AssociatedLegendrePolynomials
using LegendrePolynomials
using Lux
using LuxCUDA

θ = range(0, π, 100)
θ_gpu = range(0, π, 100) |> gpu_device() .|> Float64

Plm1 = AssociatedLegendrePolynomials.Plm(1, 0, cos.(θ)) # works
Plm2 = LegendrePolynomials.Plm.(cos.(θ), 1, 0) # works and gives same result as above

Plm1_gpu = AssociatedLegendrePolynomials.Plm(1, 0, cos.(θ_gpu)) #throws error: Scalar indexing is disallowed

Plm2_gpu = LegendrePolynomials.Plm.(cos.(θ_gpu), 1, 0) # throws error:  InvalidIRError: compiling MethodInstance

Here are the exact errors

julia> Plm1_gpu = AssociatedLegendrePolynomials.Plm(1, 0, cos.(θ_gpu))
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
julia> Plm2_gpu = LegendrePolynomials.Plm.(cos.(θ_gpu), 1, 0)
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#34#36")(::CUDA.CuKernelContext, ::CuDeviceVector{…}, ::Base.Broadcast.Broadcasted{…}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to mpfr_exp)
Stacktrace:

If you don’t get a solution here, you may want to open issues on the package repositories.

I don’t know these packages (and I don’t do much on GPUs), though I do related things in SphericalFunctions.jl, so I can make some ~educated guesses.

TL;DR: I doubt that you’re doing anything wrong, this is just a hard problem to solve on a GPU. You should think about whether you really need the polynomial values, or if you could change your problem to use FFTs, for example.

First, I’ll point out that computing Legendre polynomials is very difficult. Naively, you run into extremely large numbers and catastrophic cancellations very quickly. You basically have to use recurrence relations, which are not a good fit for GPUs because they tend to excel at applying the exact same relatively simple arithmetic operations to many data points simultaneously, which does not describe these recurrence relations.

The error from AssociatedLegendrePolynomials.Plm suggests that it is trying to iterate over some axis of an array — which is precisely what you need to do for recurrences. But again, GPUs don’t really like you to iterate over indices of an array like that. Taking a quick peek at the code, I’m guessing this is the core function. You see those for m in 0:mmax and for l in m+2:lmax statements, which are how you do the recurrences, and probably what CUDA doesn’t want to happen. Note that this code has lots of @simd blocks which are evidently “broadcasting” over the indices of what you call θ, whereas the author used explicit loops for the (l,m) indices — which I think is the best you can hope for. Essentially, those @simd expressions are like GPU kernels, but you’d have to iterate over slices of CuArrays, at which point GPU overheads will almost surely kill performance.

As for LegendrePolynomials.Plm, you might notice the string mpfr in the error message. This generally indicates that something is using the MPFR package, which Julia uses for the BigFloat type. So somewhere under the hood, LegendrePolynomials is evidently using BigFloats. But MPFR does a lot of allocating and pointer-y stuff that’s just not going to work on a GPU.

So I’m not at all surprised that those codes don’t work on GPU out of the box; it’s just not that kind of problem. However, a lot of applications of Legendre polynomials don’t actually need to evaluate the functions. For example, if you need spherical-harmonic transforms, you can change the problem to just use FFTs, which GPUs can do. Maybe that could work for you?

1 Like

Hi @moble, thanks for the detailed answer.

I understand that using these packages on the GPU might be impossible after all.
The reason I want to evaluate the polynomials is that I have some coefficients from a prior decomposition (where I actually use FastSpericalHarmonics.jl, which uses FFT) and I want to modify them and then use them to compute a function as a linear combination of spherical harmonics. Since the input to this function is not fixed on a grid, I was hoping that I could somehow use the approach in my OP.

In any case, it seems that I have to reconsider and find some other way to implement my reconstruction function.