[blog post] Introduction to GPU programming

sdanisch · October 19, 2018, 10:38am

Any function you write for the gpu will get executed in parallel, there is almost no opting out of that. So a most basic kernel call on the gpu is in fact already a loop, since it will always get scheduled to be executed a couple of times in parallel with a different invocation index. So this sets up a different context for the whole “do I need vectorization” discussion. Of course vectorization is a very easy way to profit from this execution model, but you might as well write a gpu function that uses loops and is fast - as long as you can run that function a couple of 100 times in parallel!
So you could indeed write a loop over a large array in parallel, and have smaller sum loops inside the big loop and its fast

Also how do i invoke the GPU’s sorting functions to sort a large vector?

Implement it Or wrap a library that already does it. Radix sort would be a nice start - there should be plenty of open source gpu kernel one could build uppon!
Contributions are more than welcome

Topic		Replies	Views
Running For loops on GPU GPU first-steps	11	6590	July 19, 2021
Problem with GPU programming GPU cudanative , cuda	4	1075	September 13, 2019
Notes on GPU Programming with Julia Teaching & Outreach	2	798	June 5, 2020
GPU Sort Function GPU question , gpuarrays , sort	20	4938	April 2, 2020
CUDAnative is awesome! GPU	12	6019	December 3, 2018

[blog post] Introduction to GPU programming

Related topics