I also want to compare this workload running solve with EnsembleThreads() instead. In particular, I want to see that workloads are being sent to all my CPUs. Is there a way to see this in real time? I checked top expecting to see multiple instances of Julia running but I didn’t see this.
If you are using NVIDIA gpus you can check nvidia-smi in a console, which should say how much memory each process is using (similar to top but for the GPU). Usually this is enough for me, but there are more sophisticated tools available for CUDA.
Do you have any ideas about the CPUs? Top shows me the % used by each process. But I have a 40 core processor that I’m testing on and I only see ~500% usage when I should expect to see >3000%…
I am not 100% sure but usually the diagnostic tools external to Julia are not totally accurate. Maybe someone else knows a good tool for profiling.
Personally, I test the scaling of functions to know what I can expect which is done with some simple benchmarking. Either I change the number of threads used and look at the speedup for the same workload, or I change the problem size to see how much more performance you can get with more cores.
For example, if you were to plot the parallel speedup vs problem size, you would expect to see something like this, which hopefully shows a speedup equal to the number of threads for large enough problem sizes:
Those last two calls will definitely use GPUs and you’ll see it in nvidia-smi calls via the utilization percentages. Though note those examples will only run for like <1 second IIRC, so you’ll need to be fast (or make the example bigger)
Multiple instances is different from multiple threads. Multiple instances would be the result of multiprocessing, i.e. using Distributed. If you’re using top, you’ll just see >100% CPU utilization (using htop is usually a lot nicer for investigating this kind of thing)
I changed modified the penultimate line so that trajectories=1000000, saveat=200.0f.0
I checked nvidia-smi and saw nothing.
In the last time, GPUTsit5(), didn’t load. I am using the DiffEqGPU library, so I’m unsure of why this is giving me an error. I’m getting the error:
UndefVarError: GPUTsit5 not defined
I kept re-running nvidia-smi while solve was running. I kept seeing nothing. So I terminated solve call. Does solve need to complete running for me to see something in nvidia-smi?
Since this exchange, I’ve been letting solve run without terminating it (trajectories=1000000,saveat=200.0f0). So the call has been running at least 10 mins. I still don’t see anything in nvidia-smi.
I don’t know how compilation is supposed to work in Julia. Am I waiting for Julia to compile 10^6 kernels (I think this is the right term for this) to send to the graphics processing units?
You probably want to use v1.8.1 with this. I haven’t tested LTS at least, but I know v1.8.1 should be fine? If that works, I can probably pin down what’s going on with LTS, but most things using advanced compiler toolchains run best with the latest release.
Same thing is occurring with 1.8.1: solve hasn’t completed running for > 10 mins when trajectories=1000000,saveat=200.0f0 and nvidia-smi shows nothing. Any ideas?
Update: It took a half hour to run the first time. I tried running it a second time, in case this decreased the compilation time because something was stored in memory. I checked nvidia-smi again and saw nothing. It took about the same time to run the second time.