Checking that work is being sent to processors: GPU vs Multiple CPUs

yoshi · September 13, 2022, 2:24am

In the Parameter Parallelism section of the DiffEqGPU.jl repo’s documentation there is some code. I got it to work. I want to check that its running on the processing units of the GPU. Is there someway to see workload is running on the individual processing units?

I also want to compare this workload running solve with EnsembleThreads() instead. In particular, I want to see that workloads are being sent to all my CPUs. Is there a way to see this in real time? I checked top expecting to see multiple instances of Julia running but I didn’t see this.

jmair · September 13, 2022, 5:42am

If you are using NVIDIA gpus you can check nvidia-smi in a console, which should say how much memory each process is using (similar to top but for the GPU). Usually this is enough for me, but there are more sophisticated tools available for CUDA.

yoshi · September 13, 2022, 7:30pm

Do you have any ideas about the CPUs? Top shows me the % used by each process. But I have a 40 core processor that I’m testing on and I only see ~500% usage when I should expect to see >3000%…

yoshi · September 13, 2022, 10:23pm

Also when I check nvdia-smi I don’t see any processes

jmair · September 13, 2022, 10:37pm

I am not 100% sure but usually the diagnostic tools external to Julia are not totally accurate. Maybe someone else knows a good tool for profiling.

Personally, I test the scaling of functions to know what I can expect which is done with some simple benchmarking. Either I change the number of threads used and look at the speedup for the same workload, or I change the problem size to see how much more performance you can get with more cores.
For example, if you were to plot the parallel speedup vs problem size, you would expect to see something like this, which hopefully shows a speedup equal to the number of threads for large enough problem sizes:

Obviously, you don’t need to go to this much effort, as you can just manually run some quick checks in the REPL.

jmair · September 13, 2022, 10:38pm

This likely means that the program doesn’t use the GPU at all

yoshi · September 13, 2022, 10:46pm

uh oh – but this package is specifically designed to run on GPUs. The code I ran is an example they share with us.

ChrisRackauckas · September 13, 2022, 10:58pm

What are you running?

using DiffEqGPU, OrdinaryDiffEq
function lorenz(du,u,p,t)
    du[1] = p[1]*(u[2]-u[1])
    du[2] = u[1]*(p[2]-u[3]) - u[2]
    du[3] = u[1]*u[2] - p[3]*u[3]
end

u0 = Float32[1.0;0.0;0.0]
tspan = (0.0f0,100.0f0)
p = [10.0f0,28.0f0,8/3f0]
prob = ODEProblem(lorenz,u0,tspan,p)
prob_func = (prob,i,repeat) -> remake(prob,p=rand(Float32,3).*p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy=false)

@time sol = solve(monteprob,Tsit5(),EnsembleGPUArray(),trajectories=10_000,saveat=1.0f0)
@time sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(), trajectories = 10_000, adaptive = true, dt = 0.1f0, save_everystep = false)

Those last two calls will definitely use GPUs and you’ll see it in nvidia-smi calls via the utilization percentages. Though note those examples will only run for like <1 second IIRC, so you’ll need to be fast (or make the example bigger)

ChrisRackauckas · September 13, 2022, 11:00pm

Multiple instances is different from multiple threads. Multiple instances would be the result of multiprocessing, i.e. using Distributed. If you’re using top, you’ll just see >100% CPU utilization (using htop is usually a lot nicer for investigating this kind of thing)

yoshi · September 13, 2022, 11:08pm

Ya I’m running this –

For the last two lines:

I changed modified the penultimate line so that trajectories=1000000, saveat=200.0f.0
I checked nvidia-smi and saw nothing.
In the last time, GPUTsit5(), didn’t load. I am using the DiffEqGPU library, so I’m unsure of why this is giving me an error. I’m getting the error:
UndefVarError: GPUTsit5 not defined

yoshi · September 13, 2022, 11:10pm

I did not know this – I’ll check htop, thanks!

ChrisRackauckas · September 13, 2022, 11:11pm

How long did it take to solve? It can be hard to see with nvidia-smi if it only lasts a second.

What package version?

yoshi · September 13, 2022, 11:17pm

I kept re-running nvidia-smi while solve was running. I kept seeing nothing. So I terminated solve call. Does solve need to complete running for me to see something in nvidia-smi?
I’m running julia 1.6.7. Is this what you mean?

ChrisRackauckas · September 13, 2022, 11:26pm

Did you terminate it when it was compiling?

yoshi · September 13, 2022, 11:37pm

oh hm, maybe.

Since this exchange, I’ve been letting solve run without terminating it (trajectories=1000000,saveat=200.0f0). So the call has been running at least 10 mins. I still don’t see anything in nvidia-smi.

I don’t know how compilation is supposed to work in Julia. Am I waiting for Julia to compile 10^6 kernels (I think this is the right term for this) to send to the graphics processing units?

ChrisRackauckas · September 13, 2022, 11:38pm

That’s odd. It should be a few seconds or so.

It just compiles one kernel function.

yoshi · September 13, 2022, 11:41pm

Is there a way to check if I have everything configured correctly?

I’m running Julia 1.6.7 (downloaded from website and run directly from the bin folder). I then added the packages DiffEqGPU, OrdinaryDiffEq.

ChrisRackauckas · September 13, 2022, 11:42pm

You probably want to use v1.8.1 with this. I haven’t tested LTS at least, but I know v1.8.1 should be fine? If that works, I can probably pin down what’s going on with LTS, but most things using advanced compiler toolchains run best with the latest release.

yoshi · September 13, 2022, 11:45pm

okay let me try this with 1.8.1

yoshi · September 14, 2022, 12:49am

Same thing is occurring with 1.8.1: solve hasn’t completed running for > 10 mins when trajectories=1000000,saveat=200.0f0 and nvidia-smi shows nothing. Any ideas?

Update: It took a half hour to run the first time. I tried running it a second time, in case this decreased the compilation time because something was stored in memory. I checked nvidia-smi again and saw nothing. It took about the same time to run the second time.

Topic		Replies	Views
DiffEqFlux GPU example slow GPU performance	3	621	January 14, 2021
Potentially Getting Started With DifferentialEquations.jl New to Julia gpu	1	397	July 25, 2022
DiffEqGPU - slow parallel solving of SDEs on GPU GPU	6	382	March 3, 2024
Julia compiler v.s. Julia GPU compiler GPU	4	374	December 19, 2024
How to distribute computation over different CPU's of my desktop New to Julia parallel , multithreading	14	687	February 5, 2022

Checking that work is being sent to processors: GPU vs Multiple CPUs

Related topics