Any good OpenCL examples to demonstrate a speedup?

I am taking my first steps trying to use Julia with GPUs, even though all I have available in my machine is a humble integrated Intel chip (UHD Graphics 620). I was pretty stoked to manage to run the CLBlast.jl example, but then later I timed it out and there doesn’t seem to be an improvement. I understand that not necessarily any calculations will end up being faster, but does anybody have a good simple example where I might observe a significant speedup, also so I can know that the library is working fine? Are there other libraries I should try?

Also: is an integrated chip like that good for anything in anybody’s experience? Please note that I am not necessarily only interested in big data kind of stuff, but also e.g. graphics applications where even a small 2x speedup over the CPU might be already interesting.

I’d check out Arrayfire.jl which is currently the only GPU array package I know of that works with Intel graphics.

1 Like

Same goes for the example in OpenCL.jl btw

Thanks, that looks interesting. Unfortunately I got a segfault after installing the downloaded library and trying to load the module. Any hits to get this running?

Did you compare the runtime of CLBlast.jl and basic BLAS from Base? If so, was BLAS using multiple threads? Or did you compare CLBlast.jl on the CPU and the GPU?
In my experience, using OpenCL on an Intel CPU directly is often similarly fast or even faster than running the same code on the integrated GPU. There can be a really significant difference if you have a dedicated GPU.