Pluto - GLMakie Button action ok in @async but crashing with Threads.@spawn

I am trying to run a continuous animation from inside Pluto, which requires a run/pause button to control.

The Making animated and interactive scientific visualizations in Makie.jl uses @async to spin off the task, but the docs state “It is strongly encouraged to favor Threads.@spawn over @async always even when no parallelism is required

However, using Threads.@spawn crashed Pluto/Julia the moment the button is pressed.
It runs ok when using @async instead.
Running the exact same code with threads.@spawn in VSCode does works.

For Threads.@spawn I also tried firing up Julia with 4 threads, Pluto with 2 threads (confirmed with Threads.nthreads() ==2), with same crash result.

Am I doing something wrong or is this a Pluto thing?
Is the use of @async acceptable?

MWE: A simple button with counter

using Base.Threads
using GLMakie

Threads.nthreads()  # ==2  #not required for @async

GLMakie.activate!(inline=false, title="Pluto Graph")  
	# the inline not really needed but guarantees standalone window

fig2 = Figure(backgroundcolor=:maroon, size=(400,200))
display(fig2)

o_isrunning = Observable(false) 
o_i = Observable(0)

buttonlabels=("Pause", " Run ") 
o_buttonlabel = @lift string(buttonlabels[$o_isrunning+1],"\n",$o_i)

button = Button(fig2[1,1], label = o_buttonlabel )

GLMakie.on(button.clicks) do clicks
	o_isrunning[] = !o_isrunning[]
	if o_isrunning[]

		#Threads.@spawn begin  # Crashes
		@async begin           # Works    

			while o_isrunning[] && isopen(fig2.scene)
				o_i[] = o_i[]+1
				notify(o_i)
				sleep(0.1)
				#yield()
			end

		end
	end
end



Start of (long) crash message when using Threads.@spawn (appears in Julia shell, userid redacted)

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ff900b00dc6 -- RegisterProcTableCallback at C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b260c545909302e9\ig9icd64.dll (unknown line)
in expression starting at none:1
RegisterProcTableCallback at C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b260c545909302e9\ig9icd64.dll (unknown line)
glBindBuffer at C:\Users\*****\.julia\packages\ModernGL\BUvna\src\functionloading.jl:73 [inlined]
bind at C:\Users\*****\.julia\packages\GLMakie\fj8mE\src\GLAbstraction\GLBuffer.jl:31
g
....

@async and @spawn both create tasks, but for spawn they are allowed to migrate between threads and for async not. So async leads to something called coroutines where multiple tasks run interleaved on the same thread. Makie is not threadsafe, its renderloop runs via @async and you will get crashes if you trigger stuff that updates OpenGL state in a different thread than the render thread.

But in general Makie’s concurrency model is not very principled right now, I couldn’t tell you what the “correct” way is to change plot objects asynchronously that’s guaranteed to work, it just empirically usually works fine to do things in @async blocks because of the threading behavior I mentioned above. Maybe @sdanisch or @ffreyer could chime in on this as they have much more insight into this part of Makie than me.

1 Like

I’ve actually been puzzling about this, since @async definitely doesn’t need the same kind of thread safety, since green tasks cannot actually run at the same time.
So a simple example shows pretty clearly, that you can’t just swap @async with @spawn:


function count(n)
    x = 0
    @sync for i in 1:n 
        @async x += 1
    end
    return x
end
function count2(n)
    x = 0
    Threads.@sync for i in 1:n
        Threads.@spawn x += 1
    end
    return x
end

count(10000) == 10000
count2(10000) == some random number

I have really no idea, why the warning is there without giving any such context.

The good news is, that [WIP] Using a Compute graph as alternative to Observables by SimonDanisch · Pull Request #4630 · MakieOrg/Makie.jl · GitHub may bring much better thread safety to Makie, so that one can use @spawn more freely. It hopefully comes almost for free from that PR, but if it doesn’t, i’m not sure when we will find the time to improve it.

1 Like

Thanks guys folks!

As I mentioned the problem seems to be with Pluto + GLMakie + Threads, as the code using Threads.@spawn runs fine under VSCode.

I also realised that the (apparently?) only non-crashing way to access any ‘global’ variables under Pluto (i.e. from other cell) into the @async task is via an Observable. Which makes sense as Pluto’s core feature is to guarantee state consistency over the entire notebook.

This is true, but it is not specified anywhere that a task runs until an explicit wait, sleep, yield or some such call. In principle, a task switch can happen between the fetch and the store in x = x + 1. It doesn’t do that right now (at least not for standard types), but if you do something like x[i] = x[i] + 1, it can happen now, depending on the type of x.

That is, even in @async tasks one should take concurrency precautions.

That is, even in @async tasks one should take concurrency precautions.

Thanks @sgaure. That makes sense. However, I am totally new to asynchronous programming, so trying to get my head around.
Would that be using the Low-level synchronization using schedule and wait ?

It doesn’t do that right now (at least not for standard types), but if you do something like x[i] = x[i] + 1, it can happen now, depending on the type of x.

Do you happen to know if accessing observable value obs[] is a standard type (and should be safe) or at risk?

I don’t know.

Thanks all! :slight_smile:

For posterity: I suspect the original problem with Threads.@spawn occurring in Pluto (only?) has something to do with Pluto issue 2779: “In Pluto’s source code we have lots of @async

This is wildly incorrect.

The @async code used here is still very bad. Just because you happened to not hit the racy condition in the call you wrote does not mean that it can’t happen.

julia> function count(n)
           x = 0
           @sync for i ∈ 1:n
               @async begin
                   y = x + 1
                   yield()
                   x = y
               end
           end
           x
       end;

julia> count(10000)
1

Yield-points can be highly unpredictable, e.g. they can be hit when one of the tasks does IO, or if you hit a dynamic dispatch or any number of other non-visible things occur. It can also depend on stuff like the optimization level that julia was run with.

@async does not have significantly less thread safety needs than @spawn. If anything, it’s kinda worse in this regard because the problems can be harder to detect, so bugs can be hidden for much longer!

You’re right, but my point was, that in practice there are very few yield points in critical code (e.g. +(::Int, ::Int)), so it’s not a surprise to get segfaults when going from @async to @spawn :wink:
Even though this might be an indication, that the @async code was never fully safe.

And my point is that you can’t know if your code has yield points or not, so any code that’s written assuming there’s no yield points is at best a ticking time bomb.

Where yield points are and where they occur is not something you can reason about at a library level in the overwhelming majority of cases. It can depends on stuff like if the user set -O0, if debug warnings are enabled, and all sorts of other stuff.

I don’t mean to attack you or whatever here, I just find it really concerning to see false and dangerous claims like this being made publicly on Discourse where people might get wrong ideas.

Well, I on the other hand need to explain why they cant just swap out @async with @spawn every couple of weeks, just because the docs indicate you should always :person_shrugging:

1 Like

Thanks guys, point taken. But… for a rather newby (esp compared to you fellas); taking above MWE, where/how should one secure that code to avoid conflicts?

(BTW: Sofar my full code “seems” to run ok, but instability risk taken onboard!)

Maybe a good solution to avoid races is to use the ticks observable which fires when the scene is about to be rendered. Whatever you do synchronously in a callback there should not be able to mess with data in the renderloop, as it the ticks callback is invoked synchronously inside the renderloop anyway.

1 Like