Demo: Example of CUDA/OpenGL interop in Julia


I’m working to learn a bit about graphics and CUDA programming in Julia. I made an effort to use GLFW and OpenGL to open create a window, and render graphics to that window from a CUDA kernel. I’ve seen that others have had pretty much the same questions as I about how to do that, so I thought I’d share a self-contained example of how to use CUDA to render something to OpenGL.

For this example, it only renders a static checkerboard with red and black squares, to keep it as simple as possible.

Basically, what this does is

  • Create a window using GLFW.jl
  • Create a rectangle using OpenGL and map a texture onto it
  • Map that texture to a CUDA “graphics resource”
  • Write to that graphics resource from a CUDA kernel.

I have no idea if this is the correct way, or even remotely efficient way, but it was the first thing I got working. My naive thought is that one ought to be able to write directly to whatever buffer GLFW
created for us, but I don’t know if that would actually be possible. I get the feeling that trying to do CUDA/OpenGL interop the way I want is abuse of OpenGL, so maybe there’s no good way of doing it.

I believe it only requires these packages:

] add CUDA ModernGL GLFW

I pieced it together from a number of different sources:

Here is the actual code:

# A self-contained example of CUDA/OpenGL interoperability.
# A CUDA kernel writes pixel color values to a texture,
# which is then displayed by OpenGL.
# This is the first successful attempt I made at this, so
# I have no idea about how effective a method it is.
# Essentially, this OpenGL code creates a rectangle out of
# two triangles, and maps a texture onto this rectangle.
# The CUDA kernel then writes pixels to that texture.
# This should only require the Julia packages
# - CUDA
# - GLFW
# - ModernGL
# All of this was pieced together from different sources:
# NVIDIA CUDA documentation
# A Julia Discourse post by Tim Besard
# A GitHub repository for the source code for the book CUDA For Engineers:
# OpenGL 4 examples in Julia in GitHub
# GLFW.jl

using GLFW
using CUDA
using ModernGL

# Render a checkered red surface.
# Even thread blocks are red and odd blocks are black.
function gpurender!(pixels, width)
	column = threadIdx().x - 1 + blockDim().x * (blockIdx().x - 1)
	row = threadIdx().y - 1 + blockDim().x * (blockIdx().y - 1)

	# Colors are stored in RGBA format. We're setting them in UInt32
	# so R is the least significant byte, and A is the most significant,
	# so it looks like 0xAABBGGRR.
	red   = 0x000000FF
	black = 0x00000000

	isoddblock = (blockIdx().x + blockIdx().y) % 2 == 0
	c = if isoddblock

	# The pixel array is 1D, so figure out the index of this pixel
	pixelindex = column + row * width
	# and the array is of course 1-indexed, so add 1.
	pixels[pixelindex + 1] = c


# The vertex and fragment shaders are OpenGL shaders that map the texture
# to the triangles we display on the screen.
# Adapted from
const VERTEX_SHADER = """
#version 410

layout (location = 0) in vec3 vertex_position;
layout (location = 1) in vec2 vt; // per-vertex texture co-ords

out vec2 texture_coordinates;

void main() {
	texture_coordinates = vt;
	gl_Position = vec4(vertex_position, 1.0);

# From
#version 410

in vec2 texture_coordinates;
uniform sampler2D basic_texture;
out vec4 frag_colour;

void main() {
	vec4 texel = texture(basic_texture, texture_coordinates);
	frag_colour = texel;

# Here we map a Pixel Buffer Object (PBO) in OpenGL to a CUDA "graphics resource".
# This graphics resource can be sent into the CUDA kernel above, as a UInt32 array
# of pixel values.
# The pixel values will then end up in the texture buffer, as far as I understand.
# See the NVIDIA CUDA documentation for more info about what the methods
# - cuGraphicsMapResources
# - cuGraphicsUnmapResources
# - cuGraphicsResourceGetMappedPointer_v2
# actually do.
function render(resource)
	# Map CUDA graphics resource
	# This allows us to write to the Pixel Buffer Object from CUDA.
	CUDA.cuGraphicsMapResources(1, [resource], stream())

	# Here we get a CUDA pointer to the graphics resource.
	deviceptrref = Ref{CUDA.CUdeviceptr}()
	numbytesref = Ref{Csize_t}()
	CUDA.cuGraphicsResourceGetMappedPointer_v2(deviceptrref, numbytesref, resource)

	# Here we reinterpret it as a UInt32 CuArray.
	deviceptr = reinterpret(CuPtr{UInt32}, deviceptrref[])
	len = Int(numbytesref[] / sizeof(UInt32))
	# devbuffer is the final CuArray{UInt32} that we can send into our kernel,
	# and write the pixel values to.
	devbuffer = unsafe_wrap(CuArray, deviceptr, len)

	# Width in pixels of the thing we're rendering.
	# It's needed to calculate the pixel index below.
	# Note that the width 1024 is hard coded in a couple of places.
	width = 1024
	@CUDA.sync @cuda threads=(16, 16) blocks=(64, 64) gpurender!(devbuffer, width)

	# Unmap CUDA graphics resource
	CUDA.cuGraphicsUnmapResources(1, [resource], stream())

# We want to create a PBO which is a "pixel buffer object".
# That's where we will write our pixel color information.
# Then we'll generate a texture.

function draw(window, vaoid::GLuint)
	width = 1024
	height = 1024
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,
                 GL_UNSIGNED_BYTE, C_NULL)

	glViewport(0, 0, GLFW.GetFramebufferSize(window)...)

	glDrawArrays(GL_TRIANGLES, 0, 6)

# Create an OpenGL shader from the VERTEX_SHADER and FRAGMENT_SHADER above.
# Adapted from
function createshader(source::String, type::GLenum)
	id = glCreateShader(type)

	glShaderSource(id, 1, Ptr{GLchar}[pointer(source)], C_NULL)

	result = Ref{GLint}()
	glGetShaderiv(id, GL_COMPILE_STATUS, result)
	if result[] != GL_TRUE
		@error "Bad shader: $(type)"


# Create an OpenGL program from the vertex and fragment shader.
# Adapted from
function createprogram(vertexshader, fragmentshader)
	id = glCreateProgram()
	glAttachShader(id, vertexshader)
	glAttachShader(id, fragmentshader)

	result = Ref{GLint}()
	glGetProgramiv(id, GL_LINK_STATUS, result)

	if result[] != GL_TRUE
		error("Could not link shader program")


function main()
	# The first CUDA launch takes a relatively long time (seconds), so we do it once at
	# startup. If we don't do this, then it will take seconds after we've created
	# the window, in the first render call. Then the window will seem to be frozen,
	# and unresponsive. You may then get a popup asking if you want to Wait or Force Quit.
	print("Preparing CUDA kernel...")
	fakearray = CuArray{UInt32}(undef, 1)
	@CUDA.sync @cuda launch=false gpurender!(fakearray, 0)
	println(" ready")

	# Create a window and its OpenGL context.
	# Note that the width and height is an exact multiple of the number of threads per block,
	# and block size in the CUDA kernel call.
	# threads=(16, 16) blocks=(64, 64)
	# To have a different width or height, one needs to add boundary checks in the CUDA kernel.
	# Also note that the width value is hard coded in a couple of places, so if you modify this,
	# then also send the width/height around in the methods that need them.
	width = 1024
	height = 1024
	window = GLFW.CreateWindow(width, height, "Julia CUDA/OpenGL interop")

	# Make the window's context current

	# Make a Pixel Buffer Object (PBO)
	# This is the buffer that the CUDA kernel writes pixel values to.
	# Adapted from CUDA For Engineers
	pbo = Ref(GLuint(0))
	glGenBuffers(1, pbo)
	glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo[])
	glBufferData(GL_PIXEL_UNPACK_BUFFER, 4*width*height*sizeof(GLubyte), C_NULL, GL_STREAM_DRAW)

	# Make a Texture Object
	# The pixel values ends up in this texture.
	# Adapted from CUDA For Engineers
	tex = Ref{GLuint}()
	glGenTextures(1, tex)
	glBindTexture(GL_TEXTURE_2D, tex[])


	# CUDA.CUgraphicsResource represents a _pointer_ type to graphics resource.
	# Here the Pixel Buffer Object is registered as a CUDA graphics resource.
	# This allows us to map it later in the `render` method.
	graphicsResourceRef = Ref{CUDA.CUgraphicsResource}()
	CUDA.cuGraphicsGLRegisterBuffer(graphicsResourceRef, pbo[], registerFlags)
	graphicsResource = graphicsResourceRef[]

	# Creating the Vertex Buffer Object, Texture Coordinates, and Vertex Array, is
	# adapted from
	# Make a Vertex Buffer Object, that will define our rectangle.
	# This defines a rectangle, using two triangles. The triangle fills the entire
	# screen.
	rectanglecoords = GLfloat[-1.0, -1.0, 0,
	                           1.0, -1.0, 0,
	                           1.0,  1.0, 0,

	                           1.0,  1.0, 0,
	                          -1.0,  1.0, 0,
	                          -1.0, -1.0, 0]
	vbo = Ref{GLuint}()
	glGenBuffers(1, vbo)
	glBindBuffer(GL_ARRAY_BUFFER, vbo[])
	glBufferData(GL_ARRAY_BUFFER, sizeof(rectanglecoords), rectanglecoords, GL_DYNAMIC_DRAW)

	# Make a Texture Coordinate thing that defines how our texture maps to our rectangle.
	texcoords = GLfloat[0.0, 0.0,
	                    1.0, 0.0,
	                    1.0, 1.0,

	                    1.0, 1.0,
	                    0.0, 1.0,
	                    0.0, 0.0]
	texcoordvbo = Ref{GLuint}()
	glGenBuffers(1, texcoordvbo)
	glBindBuffer(GL_ARRAY_BUFFER, texcoordvbo[])
	glBufferData(GL_ARRAY_BUFFER, sizeof(texcoords), texcoords, GL_DYNAMIC_DRAW)

	# Create a Vertex Array Object that ties them together, or something.
	vao = Ref{GLuint}()
	glGenVertexArrays(1, vao)
	glBindBuffer(GL_ARRAY_BUFFER, vbo[])
	glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, C_NULL)
	glBindBuffer(GL_ARRAY_BUFFER, texcoordvbo[])
	glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 0, C_NULL)

	# This clears the screen with a very green background.
	# This is useful for debugging. If you see some green in your window,
	# then it comes from here, and isn't covered by what we write from the
	# CUDA kernel.
	glClearColor(0.0, 1.0, 0.0, 1.0)

	# OpenGL shader compilation and linking.
	vert_shader = createshader(VERTEX_SHADER, GL_VERTEX_SHADER)
	frag_shader = createshader(FRAGMENT_SHADER, GL_FRAGMENT_SHADER)
	shader_prog = createprogram(vert_shader, frag_shader)

	# Loop until the user closes the window
	while !GLFW.WindowShouldClose(window)

		# Render here
		render(graphicsResource) # <- Call the CUDA kernel to write pixels.
		draw(window, vao[])

		# Swap front and back buffers

		# Poll for and process events

	# Unregister CUDA resources
	glDeleteBuffers(1, pbo)
	glDeleteTextures(1, tex)