Parallel computing and GPU support in neuralPDE.jl package

Aakhash_Sundaresan · October 15, 2023, 12:13pm

I’m actually trying to use the NeuralPDE.jl package for solving a system of PDE’s. I dont know for some reason the performance is massively slow for a neural network of size 30 neurons with 3 HL and 9 such individual networks. When I try to use 20 neurons, the performance is decent like it takes 2 seconds for an iteration but the moment i increase the number of layers or neurons, the performance drops. My question is; is there a way to perform multi-threading for the line of code “res = Optimization.solve(prob,optimizer, maxiters = numIters)”. Multi threading can only be implemented in a for loop. But here or any example in the neuralPDE documentation doesnt explain about using multi threading or GPU support properly. Even the case of using GPU’s is not documented properly and it invokes lots of confusions when using Flux and Lux implementation types.
How can I optimize any of the example problems given in the NeuralPDE documentation, if I want to use a High performance machine. In order to use multiple cores or threads of the HPC, how can i make the code efficient to run on the HPC for improved performance?
Please kindly help me out on this issue…

ChrisRackauckas · October 15, 2023, 5:53pm

I’ll start by saying PINNs are a pretty slow method for solving/training in general, so if you want something faster you may want to use something like MethodOfLines.jl or direct DifferentialEquations.jl on most problems.

It should multithread automatically.

Did you do using MKL? You’ll get better thread performance with MKL over OpenBLAS on most platforms.

What is your issue with Using GPUs · NeuralPDE.jl ? As documented, all you have to do is put p on a GPU and everything else is automatic, so there’s nothing else to it than what’s shown in the docs. That said, the size you’re talking about almost certainly won’t be faster on GPU. Did you smoke test some matmuls to see?

Aakhash_Sundaresan · October 15, 2023, 9:37pm

Hi Chris,
Thanks for helping me out…
I’m just curious if we need to start Julia with multi threads for it to use multi threading ? And also is it enough if we just use using MKL and everything will be taken care under the hood…?
The problem that I’m trying to solve is the RANS equations in fluid mechanics with energy equation and it has 9 variables. Hence I’m using 9 neural networks to train using just 20 neurons and 3 hidden layers. If I try to increase the number of neurons or hidden layers the performance drops. If I try to use the gpu, it says out of memory error for a 48gb vram gpu…can you guide me on how I can properly put the code on to a gpu in hpc? I have used flux and I move model, experimental training data and the initial Parameters to the gpu…
Please help me out on how I can improve performance with the gpu…?

ChrisRackauckas · October 15, 2023, 9:55pm

For Julia’s threads you have to start it with threads choice. For BLAS thread it’s already set.

Share the network archiecture.

Aakhash_Sundaresan · October 17, 2023, 4:33am

Here is the network architecture that I’m currently using… I would be really grateful if you could help me figure out a way to speed up the computations …

input = length(domains);
if useGPU
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] |> gpu
else
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] 
end
chain = fmap(f64,chain)

# Note for Flux chain, we need to destructure the network parameters inorder to account for the way the Flux data structure is designed
if useGPU
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain] |> gpu
else
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain]
end
acum = [0; accumulate(+,length.(init_params))]
sep = [(acum[i] + 1):acum[i+1] for i ∈ 1:(length(acum)-1)]

Aakhash_Sundaresan · October 17, 2023, 4:34am

The numNeurons is 30 and numLayers is 3

Aakhash_Sundaresan · October 17, 2023, 4:36am

input = length(domains);
if useGPU
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] |> gpu
else
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] 
end
chain = fmap(f64,chain)

# Note for Flux chain, we need to destructure the network parameters inorder to account for the way the Flux data structure is designed
if useGPU
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain] |> gpu
else
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain]
end
acum = [0; accumulate(+,length.(init_params))]
sep = [(acum[i] + 1):acum[i+1] for i ∈ 1:(length(acum)-1)]

With numLayers = 3 and numNeurons = 30…
Also you had mentioned to use MethodsOFLines.jl package but I have experimntal data to which I’m utilizing to solve the PDE…is this possible in MethodOfLines.jl package Chris?

ChrisRackauckas · October 17, 2023, 7:21am

Did you try using Lux? That would decrease the cost due to removing the destructure/restructures.

Yes, what are you trying to do, solve an inverse problem?

Aakhash_Sundaresan · October 17, 2023, 8:01am

Not really an inverse problem but, kind of like using the experimental data and physics to extrapolate the flow quantities close to the wall…is this possible with methodOf Lines.jl ?

Aakhash_Sundaresan · October 17, 2023, 8:02am

Is the destructuring and restructuring cause a performance issue?

ChrisRackauckas · October 17, 2023, 9:12am

It can definitely be, yes. Did you check profiles in your case? I’d just switch to Lux anyways to at least rule it out and keep things simple.

ChrisRackauckas · October 17, 2023, 9:12am

Use a UDE formulation?

Aakhash_Sundaresan · October 17, 2023, 9:21am

Is there any tutorial for that Chris? Can you divert me to some resources where I can acquire knowledge about that please?

ChrisRackauckas · October 19, 2023, 10:01am

You’d do Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations · Overview of Julia's SciML but on a PDE discretization

Aakhash_Sundaresan · October 19, 2023, 11:33am

I’m not understanding anything of the content which you had shared Chris…I’m entirely new to this ecosystem…can you help me further? Please…?
Also I tried to stick with the neuralPDE framework, why is that my data loss is not decreasing below 1e-2 and my predicted flow field is able to capture the flow structure but then when I try to estimate the gradients at the wall, I quantitatively underestimate the result from the PINN predicted solutions at the wall…

ChrisRackauckas · October 19, 2023, 11:41am

That’s a property of PINNs. They don’t tend to converge as far as normal PDE solvers

Aakhash_Sundaresan · October 19, 2023, 11:49am

Then what is the prompt solution for this Chris? I’m stuck…can you kindly guide me on that please??

ChrisRackauckas · October 19, 2023, 12:02pm

Take a step back. What are you trying to do?

Aakhash_Sundaresan · October 19, 2023, 12:12pm

Ok I can explain the entire story and if possible please guide me on this regard Chris…
What I’m rying to do is…Basically, imagine i have a flow over a heated flat plate placed inside a square duct. Now I try to experimentally acquire the whole field velocity and temperature data using optical techniques. Now the problem is the optical technqiues do not have enough resolution to accurately capture the near wall gradients. So I thought, Ok! I have the experimental data accurate enough above the wall and I can use PINN’s to estimate the flow structure close to the wall…Firstly does this make sense to use PINN’s to do that? I though it does and I proceeded for which I get poor convergence and underpredicted flow quantities close to the wall…I had given appropriate boundary conditions in the neuralPDE framework, but looking at the solution, it seems that the boundary conditions are not satisfied in the PINN predicted solution fields. But I must say the flow structures in the far field region match well with that of the experimental data…So the list of problems are:-

Poor convergence especially the data loss term, whcih affects the total loss because my total_loss = PDE_Loss + BC_Loss + DataLoss
Underestimated solution field when predicting from the neural network.
Very slow process…
Can you please show me some path or shed light on this regard to help me solve the problem efficiently…I would be really grateful to you Chris…

Aakhash_Sundaresan · October 19, 2023, 12:48pm

I had just come across one of your research articles where this was mentioned. My problem is similar to this one Chris. I’m using the continuity, x,y momentum and energy equations (RANS formulation for turbulent fows) where I have the data for the turbulent shear stress and turbulent heat flux from experiments. I tried to implement this in the neuralPDE framework. Now can you guide me for solving the problem using the UDE implementation ? Just few hint points or some directives would help me solve the problem…please?

Topic		Replies	Views
NeuralPDE features and GPU compatibility Machine Learning gpu , pde , sciml , neural-network	9	794	October 16, 2023
Questions on NeuralPDE.jl Modelling & Simulations	5	751	July 7, 2022
Struggling to train a UDE model with a GPU New to Julia question , cuda , differentialequation , diffeqflux	3	383	February 14, 2023
What is the best way to implement PINN in Julia New to Julia pde , neural-network , autodiff	18	543	April 6, 2025
2D wave equation with open boundary conditions Modelling & Simulations diffeq , sciml	32	1292	September 14, 2023

Parallel computing and GPU support in neuralPDE.jl package

Related topics