Big PDESystem exceeds maximal parameter space of CUDA kernels

drsk · July 1, 2023, 9:30am

I’m using NeuralPDE.jl to find critical points of a geometric flow equation on the four-torus. This ends up being a fairly big system, with 18 variables in total. I think the size of the system makes the CUDA compiler generate kernels that exceed the maximal parameter space, i.e. the issue described here: Guard against exceeding maximum kernel parameter size · Issue #32 · JuliaGPU/CUDA.jl · GitHub. Is there a way to avoid this, maybe a parameter for the CUDA compilation? Or is this a bug in the way the kernel gets assembled in the first place from the PDESystem?

maleadt · July 3, 2023, 7:47am

There’s a suggestion in that issue: Pass the large argument as a one-element array.

drsk · July 3, 2023, 8:32am

Thanks for the answer @maleadt ! I did see the suggestion, but I don’t know how I can control the generated CUDA code. Is there a way I can mangle with it after the automatic generation? Or are you suggesting finding whatever is generating these huge tuple arguments and patching that library?

maleadt · July 4, 2023, 5:48am

Yes. You can’t just patch up the generated code, as changing the argument from a value to reference type changes the calling convention (i.e. requires you to pass a pointer to memory, e.g., by using an Array).

drsk · July 4, 2023, 7:26am

That makes sense. I can see that Lux.jl is creating these named tuples, but it could also be that ComponentArrays.jl introduces the problem. The offending line of code is

[Lux.setup(Random.default_rng(), c)[1] |> ComponentArray |> gpu .|> Float64 for c in chains]

I’ll try to pinpoint it. I’m a bit surprised that I’m the first one to hit this problem, as this seems to be a fairly common way of using Lux.jl parameters on the GPU.

drsk · July 5, 2023, 1:20pm

This was caused by a more complicated PDE equation, which made NeuralPDE.jl compile a fairly complicated loss function. I simplified the constraints by reformulating them into several smaller ones. This solves the problem for me. I’m not sure yet if I’m paying a performance penalty for this though.

ChrisRackauckas · July 5, 2023, 1:27pm

worth opening an issue to track

drsk · July 6, 2023, 6:48am

@ChrisRackauckas I think this is essentially a CUDA.jl and not so much a NeuralPDE.jl problem. The issue is tracked here Guard against exceeding maximum kernel parameter size · Issue #32 · JuliaGPU/CUDA.jl · GitHub.

Topic		Replies	Views
CuArray error when running NeuralPDE.jl on JuliaHub GPU sciml , neural-network	0	208	July 19, 2023
NeuralPDE features and GPU compatibility Machine Learning gpu , pde , sciml , neural-network	9	784	October 16, 2023
Inverse problem with NeuralPDE and GPU support Machine Learning gpu , gpuarrays , pde , sciml , inverse	5	383	January 23, 2024
Julia crashes for larger PINNs? Performance	6	343	February 11, 2024
Having issues with running NeuralPDE on GPU GPU question , gpuarrays , sciml , neural-network	5	307	January 22, 2024

Big PDESystem exceeds maximal parameter space of CUDA kernels

Related topics