How can this weird `@profview` vs `@time` result be explained?

I’m trying to profile record in Makie to see where there might be optimization potential. So I was just calling @profview on a simple record invocation with a throwaway plot, like this:

julia> using GLMakie

julia> @profview record(lines(cumsum(randn(1000))), "test.mp4", 1:10, framerate = 60) do i
       end

The flame graph weirdly shows the lines call much more prominently than the recording machinery for 10 frames, the lines column is about six times wider than the record part:

In record, the function that actually renders and writes out each frame is called recordframe!. I then edited the source and put a @time statement in front of where it’s called, and ran this:

julia> @time "all" record(@time("lines", lines(cumsum(randn(1000)))), "test.mp4", 1:10, framerate = 60) do i
       end
lines: 0.008834 seconds (34.60 k allocations: 2.237 MiB)
recordframe!: 0.017088 seconds (969 allocations: 30.281 KiB)
recordframe!: 0.010208 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.006260 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.007005 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.005392 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.006734 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.005713 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.005487 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.008419 seconds (967 allocations: 30.000 KiB)
recordframe!: 0.005806 seconds (967 allocations: 30.000 KiB)
all: 0.191205 seconds (101.17 k allocations: 9.412 MiB, 15.40% compilation time)
"test.mp4"

This shows the lines part needs about 8 or 9 milliseconds, and each frame needs between 5 and 17 milliseconds. So why is recordframe! so underrepresented in the flame graph? I thought @profview was just taking backtraces periodically and counting how often certain functions are in the call stack?

My only guess is that it could have something to do with rendering happening on the GPU via OpenGL and that while we’re waiting for that, maybe the sampling profiler is completely blocked from running or so? That would completely mess up the distribution, but how can I find out if that’s the case?

Are you trying to profile compiling latency or are you trying to profile runtime? Because you’re doing the former. If you want to profile runtime, ignore the first profiling results and profile again.

Also, is this in VS Code? I find profiling in VS Code unreliable, so I use ProfileCanvas.jl (other tools are available, too) from the REPL, without any VS Code running, when I want to profile.

1 Like

The code I posted omits the warmup runs, so no I’m correctly running this for runtime performance (except the empty closure but that one does barely matter).

This is in VSCode, yeah, but isn’t the underlying profiler the basic one? Why should it not work in that environment?