A followup question.
I have a quad core computer. I can run 4 instances of python, 1 on each core in parrallel. How does Julia work with multiple core computers? does it automatically use all available cores or just one at a time? Can it also make use of the graphic card cores?
Julia is much better than python for threading. The GIL means that you need separate processes in python to use multiple cores, but in Julia the preferred way is to use threads, which are much lighter weight. Currently, most code is only 1 core, but Julia’s threading is pretty new (1.3), so over the next year or so, you should expect more and more of base to effectively use multiple cores. In terms of gpu, Julia can use it (mainly cuda), but this is very non-automatic currently.
You can rewrite things in pytorch and put all computation in GPU, and typically 35x speedup is expected.
Wow 35x.
I just checked though. Wikipedia says that Pytorch is a machine learning library. My program is not doing machine learning.
It is a (kind of) replacement for numpy. The speedup is more of GPU vs CPU rather than python vs julia.
I see now. I will investigate Pytorch and see if I can implement it.
Thanks Peter
You could use CuArrays / CudaNative (for NVidia GPUs) for GPU computing in Julia. Depending on what you want to do (and how it is implemented), you may be 100 times faster or 1000 times slower than on the CPU.
GPU calculation units are limited in functionality and very slow (compared to CPUs), but there are a lot of them.
I would first try optimizing the algorithm on CPU (as outlined above), and check afterwards if moving parts of the calculations (like matrix operations) to GPU would be benificial.
Edit: CuArrays is practically a 1:1 replacement for native Julia arrays on GPUs. Therefore, it is probably similar to the PyTorch suggestion above, with the added benefits of Julia CPU speed and the possibility to write Cuda Kernels directly in Julia (with CudaNative).
2 Likes
No… a speedup of 35x is definitely not expected with a GPU. Whether a GPU is useful highly depends on how well the problem can be converted to a SPMD form, which is a small subset of problems (that happens to include dense matrix multiplication). There are a lot of common things, like SVD or even just small matrix operations, which just do not have good GPU algorithms. Suggesting 35x magic is…
3 Likes
Yeah svd is slow on gpu currently, but I guess the problem at hand does not involve such routines.