This is more of a conceptual question about CUDA integration (CUDA.jl) with Optimization.jl
Does the CUDA integration speed up both compilation AND solving time, or just solving time?
This is more of a conceptual question about CUDA integration (CUDA.jl) with Optimization.jl
Does the CUDA integration speed up both compilation AND solving time, or just solving time?
Does Optimization.jl support CUDA integration?
Note that Optimization.jl does not implement the solution algorithms. It just forwards the problem to a solver backend. In most cases, these solvers do not support GPUs.
If Optimization does support CUDA and there is a solver which exploits GPUs, then I would expect the compilation time to be a little slower (because there is more work to do) and the solving time to be highly problem-dependent. In most cases, GPUs do not improve the performance of optimization algorithms. (As one exception, see https://github.com/sshin23/ExaModels.jl.)
Yes it’s used in a lot of examples throughout the ecosystem
https://docs.sciml.ai/NeuralPDE/stable/tutorials/gpu/
Note that we do have a set of solvers coming out soon that will GPU in some nice ways. That’s about 3 months out though, aiming for before JuliaCon.
Generally it depends, but usually the compilation is a little higher since most of the time there is at least some kernels to build due to broadcast. But there’s some work in Julia v1.11 for being able to cache some more of this so that should be helpful in the future.
Extremely dependent on the problem. Generally you need to have some kind of O(n^2) or O(n^3) behavior for it to make sense. Matrix multiplications (neural networks), LU-factorizations (stiff ODE solves), or something of the sort. If it’s a bunch of O(n) operations I would expect magic.
But note that there’s effectively 3 different ways that GPUs can be used here, and we shouldn’t conflate the 3.
f
, where all the optimizer really needs to do is ensure that its operations keep the state on the GPU (to reduce memory overhead) and thus the core aspect of performance is whether your f
is suitable for GPUs. This is what you would do for very large state vectors, like the weight vector of a deep neural network.So again, more details on (2) and (3) coming very soon (it’s been one of the big Julia Lab projects since the DiffEqGPU paper work was completed), but for now you can find the tutorials to do (1). Whether that is useful is largely dependent on context as described above.