General question on GPU Programming and on how to use low level C API

Hey everyone,

I just started to learn GPU Programming (and also rather new to Julia). I am currently trying to go through Cuda by Example but reproducing everything in Julia.

I would have 2 questions (everything is executed in a Jupyter Notebook):

  1. I have the following code to do a simple Hello World example.
using CUDAnative, CUDAdrv

function hello_world()
    @cuprintf("Hello Woarld from the GPU\n")

If I run @cuda hello_world() in a different cell, there will be no output until I use synchronize().

While, when I run @cuda hello_world() inside the same cell, I do get an output, but if I change the string I need to run it twice to see the new string. Again, if I add synchronize() this doesn’t happen (new string get printed first time I run the cell).

Not sure I understand what is happening here…

  1. I am trying to run a low-level C API function cuDeviceGetProperties. For this, I need a (prop, dev) with type (Ptr{CUdevprop}, CUdevice) . For the device, I know I can get it with CuDevice(0), but I have no idea for the prop…

I tried defining my own struct with similar field as in the book (which would likely not work since the field likely changed with new version of CUDA), and it fails.

Doing something similar to this :

using CUDAdrv

struct CUdevprop
    (define fields here)

prop = Ref{CUdevprop}()

Any help would be greatly appreciated :slight_smile:


1 Like

Hi Oliver ,
regarding question n. 2, to get the device properties you can use CUDA.attribute function. E.g.

using CUDA

function print_gpu_properties()

    for (i,device) in enumerate(CUDA.devices())
        println("*** General properties for device $i ***")
        name =
        println("Device name: $name")
        println("Compute capabilities: $major.$minor")
        clock_rate = CUDA.attribute(device, CUDA.CU_DEVICE_ATTRIBUTE_CLOCK_RATE)
        println("Clock rate: $clock_rate")
        device_overlap = CUDA.attribute(device, CUDA.CU_DEVICE_ATTRIBUTE_GPU_OVERLAP)
        print("Device copy overlap: ")
        println(device_overlap > 0 ? "enabled" : "disabled")
        kernel_exec_timeout = CUDA.attribute(device, CUDA.CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT)
        print("Kernel execution timeout: ")
        println(kernel_exec_timeout > 0 ? "enabled" : "disabled")