Cairo offscreen rendering


#1

Hey all! I’m working on an application which uses Gtk.jl and associated Graphics packages to render real-time simulations of spiking neural networks. The application is intended to be able to render a large number of different spike raster plots and voltage graphs at a time, each updating at somewhere between 10 and 30 FPS. I’m currently employing the Canvas widget to plot these graphics using a simple 2D grid of colored rectangles, with the average graph having a size of around 64 x 64 rectangles.

My issue right now, is that even with just 2 or 3 Canvases running at a slow 10 FPS, my Xorg process is eating up my poor fake-quad-core CPU’s resources (I’ve of course verified that I’m using GPU rendering, although it is an NVidia laptop GPU). I’ve read online that this can be fixed by rendering everything offscreen to an ImageSurface in Cairo, and then painting it all to the screen in one command. This sounds quite reasonable to me, but I have no idea how to actually do this, as all the tutorials I’ve found only show how to do this with the Cairo C interface.

Could anyone point me to an example of doing this sort of offscreen rendering, such that it can work in Gtk.jl? I’d very much appreciate it! :slight_smile:


#2

Gtk.jl should handle buffer swaps for you. You might need to at least sketch out what your renderloop looks like, if you want more info.


#3

Hey @tim.holy, thanks for the quick reply! Good point about the code sketch, I seem to have neglected to include one initially. Here’s an MWE that closely emulates what my application will be doing:

using Gtk, Gtk.ShortNames, Graphics

win = Window("Test")
hbox = Box(:h)
setproperty!(hbox, :homogeneous, true)
push!(win, hbox)
for i = 1:4
  canv = Canvas()
  push!(hbox, canv)
  @schedule begin
    while true
      @guarded draw(canv) do widget
        ctx = getgc(canv)
        w, h = width(canv), height(canv)

        offsetW = w/32
        offsetH = h/32
        @inbounds for x = 1:32
          @inbounds for y = 1:32
            set_source_rgb(ctx, rand(), rand(), rand())
            rectangle(ctx, x*offsetW, y*offsetH, offsetW, offsetH)
            fill(ctx)
          end
        end
      end
      sleep(1/10)
    end
  end
end
showall(win)

cond = Condition()
signal_connect(win, :destroy) do widget
  notify(cond)
end
wait(cond)

Hopefully I’m not doing something stupid here, this just seemed like the logical approach for the application model I’m working with right now. You can increase the value of i to above 4, but it is already getting quite slow at a value of 4 for me. It seems like the julia process is mostly quiet, but Xorg eats at least a full CPU core trying to render this. I figured that it’s doing a re-draw at every fill(ctx) call, but I can’t be sure.


#4

That code works pretty well for me (QuartzSurface). Gtk does manage the buffer swap, but does everything using the native surface type (X11 for you). You could try replacing gdk_window_create_similar_surface with gdk_window_create_similar_image_surface in Gtk.jl/src/cairo.jl and see if that alters your performance.

Or if you’re in the mood just to write the screen pixels directly, you could wrap a matrix in a GdkPixbuf and draw that to a GtkImage. For example:

img = Image()
data = Matrix{Gtk.RGB}(100, 100)
buf = Pixbuf(data=data, has_alpha=false)

# draw to data then call:
setproperty!(img, :pixbuf, buf)

#5

Yes, Jameson is right that rather than drawing lots of boxes you might consider blitting as an image. If you’d rather use Cairo directly these lines might be informative.


#6

Some points here:

  • actually the Cairo.jl API isn’t that far away from from the C-API, so porting code is in the most cases done with find/replace (this does not apply to GTK-Cairo interfacing…)
  • How did you verify, that you render Cairo via GPU? afaiu this ONLY happens on a XCB backend and only for on-screen rendering.
  • If i indentify correctly, you’re trying to put rectangular field filled with color on-screen, then yes, there are more optimized way than filling 1024 rectangular areas. In many cases the work then is split into painting to memory and to put the data on-screen (Jameson’s example, which is a pixel-to-pixel copy) for that you’d need to do the rectangular area filling into memory for yourself.
  • Another way i used in e.g. https://github.com/GiovineItalia/Compose.jl/pull/141/files#diff-515ba812075dc0588b80962eb71ad83e where single pixels of a surface are colored (prim.data) and then this is used to paint on-screen by scaling. pixman, the cairo internal bitmap handler is quite efficient with that (FILTER_NEAREST).

#7

@tim.holy: Gtk.jl should handle buffer swaps for you. You might need to at least sketch out what your renderloop looks like, if you want more info.

Are you saying that GTK will only refresh its buffer at the very end of the @guarded draw() command?

I could never really tell when things were redrawn to screen, so I always did my buffering explicitly to avoid potential issues (ex: in InspectDR.jl).


#8

FYI: The following snippet shows how I did buffered rendering on my own plotting tool:

using Gtk, Gtk.ShortNames, Graphics

win = Window("Test")
hbox = Box(:h)
setproperty!(hbox, :homogeneous, true)
push!(win, hbox)
for i = 1:4
  canv = Canvas()
  push!(hbox, canv)
  @schedule begin
    while true
      @guarded draw(canv) do widget
        w, h = width(canv), height(canv)
        buf = Gtk.cairo_surface_for(canv) #Re-create in case size changes
        bctx = Cairo.CairoContext(buf)

        offsetW = w/32
        offsetH = h/32
        @inbounds for x = 1:32
          @inbounds for y = 1:32
            set_source_rgb(bctx, rand(), rand(), rand())
            rectangle(bctx, x*offsetW, y*offsetH, offsetW, offsetH)
            fill(bctx)
          end
        end

        #Perform block transfer (Efficiently paint canvas in "single step"):
        ctx = getgc(canv)
        Cairo.set_source_surface(ctx, buf, 0, 0)
        Cairo.paint(ctx) #Applies contents of buf

        Cairo.destroy(bctx)
        Cairo.destroy(buf)
      end
      sleep(1/10)
    end
  end
end
showall(win)

cond = Condition()
signal_connect(win, :destroy) do widget
  notify(cond)
end
wait(cond)

I admit, I am uncertain if explicit buffering is actually needed. I only did it because I lack detailed knowledge of how GTK is implemented.

Also: It is probably inefficient to create a new buffer each time you redraw your window. It is likely better to re-use the same image buffer through the lifetime of the window. Of course that involves some sort of scheme to compensate for a user eventually resizing said window.


#9

See here, here, and here. The buffer swapping is handled by Gtk.jl, not by GTK.


#10

Jameson you seem to be correct about it rendering much better on a QuartzSurface, although it still chokes when I run it at a higher FPS or with more Canvases simultaneously. I’ll try to give your code a shot tomorrow to see if that improves anything.

@lobingera I’m not 100% sure that I’m using GPU rendering, but nouveau seems to get used successfully by Xorg, and I assume that the julia process would have a much higher CPU utilization than it does now if it were using CPU rendering (the julia process hovers around 2% utilization during rendering). If there’s a way to confirm that the GPU is being activated, I’d love to know how!

I did figure that constantly drawing a ton of little rectangles might be really slow, so your idea makes a lot of sense to me. Of course it’ll only work for plots where I’m using a grid, but that’s probably 90% of what I’ll be rendering anyway. I’ll try to get code similar to what you linked to working tomorrow, and will report back with results if I’m successful.

@MA_Laforge I just tested your code (adding in checks to ensure that I only create a new buffer on resize), and unfortunately it doesn’t seem to fix Xorg going crazy. I appreciate the code though, it may still prove useful later on!

This example that we’re using is probably my worst-case for what I’m trying to render, so ideally not everything will be as slow as this example (and will probably be faster on one of my desktop machines, which still need to be configured with X). Since these are modeling spiking neurons, I can always just plot the spikes themselves (which would only be a handful of rectangles) and only render voltage periodically, which would probably work fine. But if I can get the voltage plot to work well, that would be a big bonus! :smiley:


#11

i just checked -> https://gist.github.com/lobingera/912a7daf827406219da28c6e3e817896

with your plain example i get 12% load of Xorg and 10% of julia (see top output in the screenshot)
with my painting bitmap by scaling i get 1% Xorg and 4% julia.

Both on 1/10 sleep.

julia> versioninfo()
Julia Version 0.5.2
Commit f4c6c9d* (2017-05-06 16:34 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel® Core™ i3-2120 CPU @ 3.30GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, sandybridge)

so X11 intel driver.

and i think this still could be speed up by avoiding the double buffering in GTK.jl


#12

Oh my glob, that is amazing! I get 2% Xorg and 2% julia with your pixel scaling code… I can even increase the FPS to 30 without experiencing lag or tearing.

It seems like I’ve learned two things:

  1. Your approach is 1000% better than mine for rectangle grid-type simulations.
  2. My Xorg configuration is probably quite screwed up. Time to look back at my driver settings :smile:

Thanks for all the help!