Parallel computing and GPU support in neuralPDE.jl package

I’m actually trying to use the NeuralPDE.jl package for solving a system of PDE’s. I dont know for some reason the performance is massively slow for a neural network of size 30 neurons with 3 HL and 9 such individual networks. When I try to use 20 neurons, the performance is decent like it takes 2 seconds for an iteration but the moment i increase the number of layers or neurons, the performance drops. My question is; is there a way to perform multi-threading for the line of code “res = Optimization.solve(prob,optimizer, maxiters = numIters)”. Multi threading can only be implemented in a for loop. But here or any example in the neuralPDE documentation doesnt explain about using multi threading or GPU support properly. Even the case of using GPU’s is not documented properly and it invokes lots of confusions when using Flux and Lux implementation types.
How can I optimize any of the example problems given in the NeuralPDE documentation, if I want to use a High performance machine. In order to use multiple cores or threads of the HPC, how can i make the code efficient to run on the HPC for improved performance?
Please kindly help me out on this issue…

I’ll start by saying PINNs are a pretty slow method for solving/training in general, so if you want something faster you may want to use something like MethodOfLines.jl or direct DifferentialEquations.jl on most problems.

It should multithread automatically.

Did you do using MKL? You’ll get better thread performance with MKL over OpenBLAS on most platforms.

What is your issue with Using GPUs · NeuralPDE.jl ? As documented, all you have to do is put p on a GPU and everything else is automatic, so there’s nothing else to it than what’s shown in the docs. That said, the size you’re talking about almost certainly won’t be faster on GPU. Did you smoke test some matmuls to see?

Hi Chris,
Thanks for helping me out…
I’m just curious if we need to start Julia with multi threads for it to use multi threading ? And also is it enough if we just use using MKL and everything will be taken care under the hood…?
The problem that I’m trying to solve is the RANS equations in fluid mechanics with energy equation and it has 9 variables. Hence I’m using 9 neural networks to train using just 20 neurons and 3 hidden layers. If I try to increase the number of neurons or hidden layers the performance drops. If I try to use the gpu, it says out of memory error for a 48gb vram gpu…can you guide me on how I can properly put the code on to a gpu in hpc? I have used flux and I move model, experimental training data and the initial Parameters to the gpu…
Please help me out on how I can improve performance with the gpu…?

For Julia’s threads you have to start it with threads choice. For BLAS thread it’s already set.

Share the network archiecture.

Here is the network architecture that I’m currently using… I would be really grateful if you could help me figure out a way to speed up the computations …

input = length(domains);
if useGPU
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] |> gpu
else
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] 
end
chain = fmap(f64,chain)

# Note for Flux chain, we need to destructure the network parameters inorder to account for the way the Flux data structure is designed
if useGPU
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain] |> gpu
else
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain]
end
acum = [0; accumulate(+,length.(init_params))]
sep = [(acum[i] + 1):acum[i+1] for i ∈ 1:(length(acum)-1)]

The numNeurons is 30 and numLayers is 3

input = length(domains);
if useGPU
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] |> gpu
else
    chain = [eval(Meta.parse("Flux.Chain(Flux.Dense($input,$numNeurons,tanh),"*("Flux.Dense($numNeurons,$numNeurons,tanh),"^numLayers)*"Flux.Dense($numNeurons,1))")) for _ in 1:9] 
end
chain = fmap(f64,chain)

# Note for Flux chain, we need to destructure the network parameters inorder to account for the way the Flux data structure is designed
if useGPU
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain] |> gpu
else
    init_params = [Float64.(Flux.destructure(c)[1]) for c in chain]
end
acum = [0; accumulate(+,length.(init_params))]
sep = [(acum[i] + 1):acum[i+1] for i ∈ 1:(length(acum)-1)]

With numLayers = 3 and numNeurons = 30…
Also you had mentioned to use MethodsOFLines.jl package but I have experimntal data to which I’m utilizing to solve the PDE…is this possible in MethodOfLines.jl package Chris?

Did you try using Lux? That would decrease the cost due to removing the destructure/restructures.

Yes, what are you trying to do, solve an inverse problem?

Not really an inverse problem but, kind of like using the experimental data and physics to extrapolate the flow quantities close to the wall…is this possible with methodOf Lines.jl ?

Is the destructuring and restructuring cause a performance issue?

It can definitely be, yes. Did you check profiles in your case? I’d just switch to Lux anyways to at least rule it out and keep things simple.

Use a UDE formulation?

Is there any tutorial for that Chris? Can you divert me to some resources where I can acquire knowledge about that please?

You’d do Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations · Overview of Julia's SciML but on a PDE discretization

I’m not understanding anything of the content which you had shared Chris…I’m entirely new to this ecosystem…can you help me further? Please…?
Also I tried to stick with the neuralPDE framework, why is that my data loss is not decreasing below 1e-2 and my predicted flow field is able to capture the flow structure but then when I try to estimate the gradients at the wall, I quantitatively underestimate the result from the PINN predicted solutions at the wall…

That’s a property of PINNs. They don’t tend to converge as far as normal PDE solvers

Then what is the prompt solution for this Chris? I’m stuck…can you kindly guide me on that please??

Take a step back. What are you trying to do?

Ok I can explain the entire story and if possible please guide me on this regard Chris…
What I’m rying to do is…Basically, imagine i have a flow over a heated flat plate placed inside a square duct. Now I try to experimentally acquire the whole field velocity and temperature data using optical techniques. Now the problem is the optical technqiues do not have enough resolution to accurately capture the near wall gradients. So I thought, Ok! I have the experimental data accurate enough above the wall and I can use PINN’s to estimate the flow structure close to the wall…Firstly does this make sense to use PINN’s to do that? I though it does and I proceeded for which I get poor convergence and underpredicted flow quantities close to the wall…I had given appropriate boundary conditions in the neuralPDE framework, but looking at the solution, it seems that the boundary conditions are not satisfied in the PINN predicted solution fields. But I must say the flow structures in the far field region match well with that of the experimental data…So the list of problems are:-

  1. Poor convergence especially the data loss term, whcih affects the total loss because my total_loss = PDE_Loss + BC_Loss + DataLoss
  2. Underestimated solution field when predicting from the neural network.
  3. Very slow process…
    Can you please show me some path or shed light on this regard to help me solve the problem efficiently…I would be really grateful to you Chris…


I had just come across one of your research articles where this was mentioned. My problem is similar to this one Chris. I’m using the continuity, x,y momentum and energy equations (RANS formulation for turbulent fows) where I have the data for the turbulent shear stress and turbulent heat flux from experiments. I tried to implement this in the neuralPDE framework. Now can you guide me for solving the problem using the UDE implementation ? Just few hint points or some directives would help me solve the problem…please?