Parallel computing and GPU support in neuralPDE.jl package

This is not a question about NeuralPDE.jl. You will get better answers if you ask your questions better. If you instead ask for help with your CUDA installation, you will get everyone who knows CUDA, a lot larger setup than the number of people who know physics-informed neural networks with CUDA, to help you.

You have not given enough information to solve this question anyways. Did you ]add CUDA? When you did that, what CUDA drivers do you have? What GPU do you have?

Hi @ChrisRackauckas, extremely sorry for posting the irrelevant question here…apologies…I’m having the exact same problem as outlined in NeuralPDE features and GPU compatibility. No wthat I have shifted to Flux and the problem seems to be solved… However, I ran a matmul smoke test with x=rand(100000,100000), y = rand(100000,100000) and I got 11.2 s of computation time on cpu and “Out of Memory error” on the GPU (Nvidia RTX 3090 24 GB VRAM). With the matrix size reduced by an order the time taken with CPU was lowest when compared to GPU. So that means GPU doesn’t perform well right?. However I have 9 neural networks to predict the 9 variables in my governing equation. I’m unable to imrpove the performance of the NeuralPDE for my specific problem. Is it possible to have a single neural network with 9 output neurons, instead of creating separate networks for learning each of the variables in the neuralPDE framework? As adviced by you, I had visited the Chromatography repo…but, the problem is the formulation of their problem is entirely different from mine and I’m not knowing how to adapt to my specific case…What to do? Please help me out/…Thanks a lot in advance Chris…

Yes, that is too big to fit onto any GPU that exists today.

If your matrices are actually that size. Are you using neural networks with layers of size 100000? The documentation doesn’t.

Nope, I’m using 9 neural networks with 3 Hidden layers and 40 neurons in each layer. So how would I do the matmul smoke test for this case? What is the size of MxN that I have to use and how do you calculate that ?
What are the factors that affect the performance of the NeuralPDE framework? DOes it have any limit on the number of equations and BC’s that can be used, or the maximum number of layers and neurons that one can use? Or the maximum order of the differential equation that can be used?..

Also, is it that GridTraining is faster on the GPU’s and QuasiRandom sampling isn’t? On the documentation page, it says to use GridTraining only for testing purposes. That means GridTraining strategu cannot be used for final training or what is the exact reason @ChrisRackauckas

Test 40xN matmuls. N would be the batch size in the sampling, or number of samples. So like, 40x100.

There is no maximum order, though physical equations don’t tend to have above 4 and I think that’s where most of the hard-coded extra optimizations would stop. There is not a limit on the number of equations, and you generally scale linearly with that. You just scale quadratically with larger size (or think cubically as the samples grow), which is a fundamental limitation because of the matmuls in the neural networks.

GridTraining does not overcome curse of dimensionality, does not hit random points so it tends to not give great results between grid points, and has slower convergence than a quasi-random low discrepancy sampler.

Thanks for your guidance, but I had tested the matmuls on the GPU both using Flux and Lux, they seem to perform better than the CPU case, but when it comes to the NeuralPDE framework, its just very slow for some reason and I’m not able to figure out the problem…I’m just using 40 Neurons 3 Hidden layers, QuasiRandomTraining with 1000 points and LatinHypercubesampling() stratgey with LBFGS for the RANS equations with energy (so total of 4 equations) and 9 boundary conditions. I have 9 variables to be predicted, so there are 9 neural networks. @ChrisRackauckas Now you can maybe calculate the size of the parameters and estimate the performance now…Its like suddenly the iterations are faster and then it hangs for about maybe say 5 minutes or so…Is this due to the stiffness in the Loss landscape?.. If it is so, how can i overcome this issue?

@ChrisRackauckas Can you comment on the above issue please?

That’s just due to line search failures. Indeed the stiffness gives PINNs a problem, so for RANS equations PINNs aren’t great.

@ChrisRackauckas So what’s the alternative to doing this when I have the experimental data that is not close to the wall and I want to accurately extrapolate to the wall using ML ?

@ChrisRackauckas Can you please comment on the above please?