Intermittent exception with @spawn; findall() seems to be cause?

I have some code that is designed to solve a puzzle. It searches for the ideal placement of numbers in an array. The idea is to find an array configuration that maximizes 1 … k.

For a given array, there can be several valid moves. These are found by using findall() on a dictionary called cpos to search for the next number in sequence.

Searching cpos for k can return multiple points that will be a valid position for k. The valid points are stored in mvals, from mvals=findall(x-> x==k, cpos).

I originally just picked a random point from mvals. To try and upgrade the search, I modified the function to spawn a thread for every point – computing the maximum “possible k” from that might be achieved from that point – then choosing the point with the highest “possible k”. It’s a “possible k”, because the function that does the search chooses random points until no more points can be placed.

However, sometimes the @spawn of the function to find “possible k” errors out. Not always. I think I’ve narrowed it down to the findall() in the search function.

Here is my environment; I’m using Julia 1.7.0

CPU:        Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
Windows version 10.0.22000
Free memory: 43.0 GB
Julia Version 1.7.0
Commit 3bf9d17731 (2021-11-30 12:12 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, ivybridge)
Environment:
  JULIA.EXECUTABLEPATH = C:\Users\amgough\Julia\bin\julia.exe
  JULIA_GLPK_LIBRARY_PATH = C:\Program Files\glpk
  JULIA_HOME = C:\Users\amgough\Julia\bin\julia.exe
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 6

Here is the output from running the function – which I’ve tried to cut down to a “minimum example,” so it starts with a fixed matrix position:

ulia> (n,hut_cnt,final_k,a,cpos)=stonerr()
(7, 0, 35, Int32[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], Dict{Tuple, Integer}((18, 16) => 25, (20, 20) => 24, (16, 16) => 32, (19, 14) => 27, (21, 19) => 4, (18, 22)
=> 53, (15, 23) => 51, (19, 16) => 26, (16, 22) => 17, (18, 18) => 19…))

julia> (n,hut_cnt,final_k,a,cpos)=stonerr()
(7, 0, 32, Int32[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], Dict{Tuple, Integer}((23, 17) => 51, (18, 16) => 73, (20, 14) => 13, (16, 14) => 78, (15, 16) => 82, (12, 15)
 => 29, (19, 21) => 61, (17, 15) => 25, (22, 12) => 44, (16, 20) => 8…))

julia> (n,hut_cnt,final_k,a,cpos)=stonerr()
ERROR
cloc = 0.72
k = 27
cpos
Dict{Tuple, Integer}((18, 16) => 15, (26, 16) => 57, (20, 13) => 53, (24, 15) => 67, (17, 12) => 28, (17, 15) => 75, (16, 20) => 20, (20, 20) => 12, (19, 13) => 82, (20, 16) =
> 25, (19, 14) => 27, (16, 16) => 30, (25, 17) => 23, (25, 18) => 22, (14, 15) => 31, (21, 19) => 4, (19, 20) => 46, (18, 22) => 35, (19, 16) => 79, (18, 17) => 14, (20, 22) =
> 19, (18, 18) => 7, (26, 18) => 45, (26, 17) => 45, (24, 19) => 21, (15, 15) => 61, (16, 17) => 91, (20, 17) => 24, (17, 19) => 20, (20, 18) => 57, (19, 22) => 35, (22, 19) =
> 8, (16, 18) => 20, (21, 14) => 26, (21, 13) => 26, (19, 18) => 6, (21, 16) => 85, (23, 21) => 64, (24, 14) => 34, (17, 13) => 29, (17, 14) => 29, (24, 20) => 32, (24, 16) =>
 33, (17, 20) => 33, (17, 16) => 29, (22, 20) => 3, (22, 16) => 46, (25, 21) => 32, (21, 17) => 102, (21, 18) => 17, (14, 16) => 32, (18, 21) => 35, (22, 22) => 21, (24, 17) =
> 98, (17, 17) => 95, (20, 21) => 16, (17, 18) => 54, (22, 17) => 36, (15, 16) => 31, (22, 18) => 84, (19, 21) => 82, (14, 17) => 32, (23, 15) => 33, (18, 23) => 35, (25, 15)
=> 34, (20, 23) => 55, (18, 12) => 28, (18, 15) => 72, (23, 19) => 85, (26, 15) => 35, (19, 23) => 54, (16, 15) => 90, (20, 15) => 78, (24, 21) => 43, (22, 21) => 18, (25, 19)
 => 76, (19, 15) => 94, (19, 12) => 28, (23, 20) => 11, (23, 16) => 79, (18, 19) => 13, (21, 23) => 20, (26, 19) => 22, (25, 14) => 35, (16, 19) => 20, (20, 19) => 5, (25, 20)
 => 53, (21, 15) => 51, (23, 22) => 18, (25, 16) => 90, (19, 19) => 43, (18, 13) => 28, (23, 17) => 10, (23, 18) => 9, (18, 20) => 33)
ERROR: TaskFailedException

    nested task error: UndefRefError: access to undefined reference
    Stacktrace:
      [1] getindex
        @ .\array.jl:861 [inlined]
      [2] _iterate
        @ .\dict.jl:686 [inlined]
      [3] iterate
        @ .\dict.jl:690 [inlined]
      [4] iterate(f::Base.Iterators.Filter{Base.var"#104#106"{var"#63#64"}, Dict{Tuple, Integer}}, state::Int64)
        @ Base.Iterators .\iterators.jl:450
      [5] iterate
        @ .\generator.jl:44 [inlined]
      [6] grow_to!(dest::Vector{Tuple{Int64, Int64}}, itr::Base.Generator{Base.Iterators.Filter{Base.var"#104#106"{var"#63#64"}, Dict{Tuple, Integer}}, Base.var"#103#105"}, st
::Int64)
        @ Base .\array.jl:819
      [7] grow_to!(dest::Vector{Tuple}, itr::Base.Generator{Base.Iterators.Filter{Base.var"#104#106"{var"#63#64"}, Dict{Tuple, Integer}}, Base.var"#103#105"})
        @ Base .\array.jl:801
      [8] collect
        @ .\array.jl:721 [inlined]
      [9] findall
        @ .\array.jl:2253 [inlined]
     [10] stonep(a::Matrix{Int32}, cpos::Dict{Tuple, Integer}, k::Int64, hutsleft::Int64, newpt::Tuple{Int64, Int64})
        @ Main .\REPL[225]:36
     [11] (::var"#35#38"{Tuple{Int64, Int64}})()
        @ Main .\threadingconstructs.jl:178
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base .\task.jl:381
 [2] macro expansion
   @ .\task.jl:400 [inlined]
 [3] stonerr()
   @ Main .\REPL[212]:101
 [4] top-level scope
   @ REPL[230]:1 

Here is the code, which is the “minimum example” – sorry it’s so long!!!

# "minimum" example

using Base.Threads
using Random


function stonerr()

    n=7
    m=n*6

    s=21

    a= zeros(Int32,(42,42))

    a[20,19]=5
    a[21,19]=4
    a[21,20]=1
    a[21,21]=2
    a[21,22]=1
    a[22,20]=3

    cpos=Dict{Tuple, Integer}((23, 21) => 3, (22, 21) => 7, (20, 20) => 12, (23, 20) => 3, (21, 19) => 4, (22, 20) => 3, (21, 18) => 9, (20, 22) => 3, (20, 19) => 5, (22, 22) => 3, (23, 19) => 3, (20, 18) => 9, (22, 19) => 8, (20, 21) => 4)
    
    done=false
    k = Int32(6) # first number to search for
    final_k = deepcopy(k)
    lastpt = (20,19)
    hut_cnt = 5 # number of 1's left to place
    last_diag = false

    while !done

        mvals = findall(x-> x==k, cpos)
        if length(mvals)==0
            newpt = (0,0)
        else
            if length(mvals)==1
                newpt = mvals[1]
            else   
                rest = Dict()
                aa = Dict()
                cpp = Dict()         
                kd = Dict()
                hutd = Dict()
                ptd = Dict()
                for newpt in mvals
                    aa[newpt] = deepcopy(a)
                    cpp[newpt] = deepcopy(cpos)
                    kd[newpt] = deepcopy(k)
                    hutd[newpt] = deepcopy(hut_cnt)
                    ptd[newpt] = deepcopy(newpt) 
                end 

                @sync for newpt in mvals
                    rest[newpt] = @spawn stonep(aa[newpt],cpp[newpt],kd[newpt],hutd[newpt],ptd[newpt])
                end

                res = Dict()
                @sync  for i in keys(rest)
                    res[i] = fetch(rest[i])
                end 

                if length(res)>0
                    bestk = maximum(values(res))
                else
                    bestk = 0
                end 

                bestpts = ()
                for j in keys(res)
                    if res[j] == bestk 
                        bestpts = (bestpts...,j)
                        #break
                    end
                end

                if length(bestpts)>0
                    newpt=rand(bestpts)
                else
                    newpt = (0,0)
                end 
            end 
        end

        if newpt!=(0,0)

            r,c = newpt

            a[r,c]=k
            #println("at newpt updatecoords, newpt =",newpt)
            updatecoords!(cpos,a,r,c,k)

            lastpt = newpt

            k += 1 

        else 
            # need to place a hut 

            if hut_cnt==0
                final_k = k-1
                done=true
                break
            end 

            r,c = lastpt
            new_hut = (0,0)

            (gptsa,gptsb,fptsa,fptsb) = goodpt(a,k,r,c)

            diag_mov =  length(gptsa)>0
            ortho_mov = length(gptsb)>0

            allpts = (gptsa...,gptsb...,fptsa...,fptsb...)

            rest = Dict()
            aa = Dict()
            cpp = Dict()
            kd = Dict()
            hutd = Dict()
            ptd = Dict()
            for newpt in allpts
                aa[newpt] = deepcopy(a)
                cpp[newpt] = deepcopy(cpos)
                kd[newpt] = deepcopy(k)
                hutd[newpt] = deepcopy(hut_cnt)
                ptd[newpt] = deepcopy(newpt) 
            end 
            #println("At hut spawn stonep, allpts=",allpts)
            @sync for newpt in allpts
                rest[newpt] = @spawn stonep(aa[newpt],cpp[newpt],kd[newpt],hutd[newpt],ptd[newpt])
            end
            #println("\thut rest=",rest)

            res = Dict()
            @sync for i in keys(rest)
                res[i] = fetch(rest[i])
            end 

            for (i,j) in zip(keys(res),values(res))
                if j==-1
                    #println("in recovery")
                    aa[i]=deepcopy(a)
                    cpp[i]=deepcopy(cpos)
                    kd[i]=deepcopy(i)
                    hutd[i]=deepcopy(hut_cnt)
                    res[i] = stonep(aa[i],cpp[i],kd[i],hutd[i],i)
                end
            end

            #println("\thut res=",res)

            if length(res)>0
                bestk = maximum(values(res))
            end 

            bestpts = ()
            for j in keys(res)
                if res[j] == bestk 
                    bestpts = (bestpts...,j)
                end
            end

            if length(bestpts)>0
                new_hut=rand(bestpts)
            else
                new_hut = (0,0)
            end 

            if new_hut!=(0,0)

                r,c = new_hut

                a[r,c]=1
                #println("At hut update coords, hut=",new_hut)
                updatecoords!(cpos,a,r,c,k)

                lastpt = (r,c)

                hut_cnt -= 1
            else
                done=true
                final_k = k-1
            end 

        end

    end  # while 

    return(n,hut_cnt,final_k,a,cpos)

end # stonerr

# (n,hut_cnt,final_k,a,cpos)=stonerr()

function stonep(a, cpos, k, hutsleft, newpt)

    # initialize constants
    #posx=Dict(1  => (0,1), 2 => (1,1), 3=> (1,0), 4 => (1,-1), 5 => (0,-1), 6=> (-1,-1), 7 => (-1,0),8 => (-1,1))
    posx=((0,1),(1,1),(1,0),(1,-1),(0,-1),(-1,-1),(-1,0),(-1,1))
    #=println("Thread ID = ",threadid())
    println("input values")
    println("k = ",k)
    println("hutsleft = ",hutsleft)
    println("newpt = ",newpt)
    println()
    =#
    ################################
    # Loop and search              #
    ################################
    r,c = newpt
    done=false
    cloc = 0 
    lastpt = newpt
    hut_cnt = hutsleft 
    last_diag = false
    first_pass = true
    final_k = deepcopy(k)
try
    while !done

        if !first_pass
            #println(threadid(),"\tAt first pass\tk=",k)
            cloc=0.6
            if length(cpos)>0
                cloc=0.7
                #println(threadid(),"\tk = ",k);
                #println(threadid(),"\tcpos = ",cpos)
                cloc=0.71
                #println(threadid(),"\tk defined:", @isdefined(k))
                #println(threadid(),"\tcpos defined: ",@isdefined(cpos))
                cloc=0.72
                mvals = findall(x-> x==k, cpos)
                #mvals=findval(cpos,k)
                cloc=0.8
            else
                cloc=0.9
                mvals=()
                clocl=0.95
            end
            cloc=1
            #println(threadid(),"\tmvals =",mvals,"\thut_cnt=",hut_cnt)
            if length(mvals)==0
                newpt = (0,0)
            else
                cloc=1.2
                newpt = rand(mvals)
            end
        end

        if newpt!=(0,0)
            cloc=2
            r,c = newpt
            cloc=2.1
            #println(threadid(),"\tr = ",r,"\tc= ",c,"\tk = ",k)
            cloc=2.9
            a[r,c]=k
            cloc=3
            updatecoords!(cpos,a,r,c,k)
            cloc=4
            #println(threadid(),"\tlastpt = ",lastpt,"\tnewpt = ",newpt)
            lastpt = newpt
            cloc=4.1
            k += 1 
            cloc=4.2
            #println(threadid(),"\tnew k = ",k)

            first_pass = false
            cloc=4.3
        else 
            # need to place a hut 
            #println("Placing hut")
            if hut_cnt==0
                #println(threadid(),"\tDone at huts 1")
                #println("k = ",k,"\tk-1=",k-1)
                final_k = k-1
                #println(threadid(),"\tfinal_k = ",final_k)
                done=true
                cloc=5
                break
            end 

            r,c = lastpt
            new_hut = (0,0)
            cloc=5
            (gptsa,gptsb,fptsa,fptsb) = goodpt(a,k,r,c)
            cloc=6
            diag_mov =  length(gptsa)>0
            ortho_mov = length(gptsb)>0
            cloc=7
            if diag_mov & ortho_mov
                pick_mov = !last_diag
            elseif diag_mov 
                pick_mov = true
            elseif ortho_mov 
                pick_mov = false
            end 
            cloc=8
            if diag_mov | ortho_mov      # could both be false -- neeed hut
                if pick_mov  
                    if length(fptsa)==0 & rand((true,false))
                        cloc=9
                        new_hut = rand(gptsa)
                        last_diag = true
                    else
                        cloc=10
                        new_hut = rand(fptsa)
                        last_diag = true
                    end
                else
                    if length(fptsb)==0 & rand((true,false))
                        cloc=11
                        new_hut = rand(gptsb)
                        last_diag = false
                    else
                        cloc=12
                        new_hut = rand(fptsb)
                        last_diag = false
                    end
                end
            end 

            if new_hut!=(0,0)
                cloc=12
                r,c = new_hut
                #println(threadid(),"\thut\tr = ",r,"\tc= ",c)

                a[r,c]=1
                cloc=13
                updatecoords!(cpos,a,r,c,k)
                cloc=14
                lastpt = (r,c)

                hut_cnt -= 1
            else
                done=true
                #println(threadid(),"\tDone at huts 2")
                #println(threadid(),"\tk = ",k,"\tk-1=",k-1)
                final_k = k-1
                cloc=15
                #println(threadid(),"\tfinal_k = ",final_k)
            end 

            first_pass = false;

        end

    end
catch
    println("ERROR")
    println("cloc = ",cloc)
    println("k = ",k)
    println("cpos")
    show(cpos)
    println()
    rethrow()
end 
    #println("Exit final_k = ",final_k)
    #println(threadid(),"\tEXIT THREAD")
    return(final_k)

end # stonep



function updatecoords!(p::Dict, m::Matrix, r::Integer, c::Integer, kp::Integer)

    posx=((0,1),(1,1),(1,0),(1,-1),(0,-1),(-1,-1),(-1,0),(-1,1))


    for rc in values(posx)
        r0=r+rc[1]
        c0=c+rc[2]
        #println("ro = ",r0,"\tc0 = ",c0)
        ckpt=(r0,c0)

        if m[r0,c0]==0 
            val=sumpt(m,r0,c0)
            if val>=kp 
                #println("check point = ",ckpt,"\tval=",val)
                p[ckpt]=val
            end 
        end 

    end # for 

end # updatecoords


function goodpt(m::Matrix,k1::Integer, r::Integer, c::Integer)

    posx=((0,1),(1,1),(1,0),(1,-1),(0,-1),(-1,-1),(-1,0),(-1,1))
    posx1=((1,1),(-1,1),(1,-1),(-1,-1))
    posx2=((0,1),(1,0),(-1,0),(0,-1))

    gpts1 = ()
    gpts2 = ()
    fpts1 = ()
    fpts2 = ()

    for i in 1:4
        rotr, rotc = posx1[i]
        r1 = r+rotr
        c1 = c+rotc
        if m[r1,c1]!=0
            continue
        end

        if sumpt(m,r1,c1)+1 == k1
            gpts1 = (gpts1...,(r1,c1))
        end
    end # for 

    for i in 1:4
        rotr, rotc = posx2[i]
        r1 = r+rotr
        c1 = c+rotc
        if m[r1,c1]!=0
            continue
        end

        if sumpt(m,r1,c1)+1 == k1
            gpts2 = (gpts2...,(r1,c1))
        end
    end # for 

    # now check 2 out from r,c based on the good one positions
    if length(gpts1)>0
        for i in 1:8
            rotr, rotc = posx[i]
            for rc in gpts1
                r1 = rc[1]+rotr
                c1 = rc[2]+rotc
                if m[r1,c1]!=0
                    continue
                end

                if sumpt(m,r1,c1)+k1 == k1
                    fpts1 = (fpts1...,(r1,c1))
                end 

            end
        end
    end

    if length(gpts2)>0
        for i in 1:8
            rotr, rotc = posx[i]
            for rc in gpts2
                r1 = rc[1]+rotr
                c1 = rc[2]+rotc
                if m[r1,c1]!=0
                    continue
                end

                if sumpt(m,r1,c1)+k1 == k1
                    fpts2 = (fpts2...,(r1,c1))
                end 

            end
        end
    end


    return((gpts1,gpts2,fpts1,fpts2))

end # goodpt 


function sumpt(m::Matrix, r::Integer, c::Integer) 

    return(m[r,c]+m[r,c+1]+m[r+1,c+1]+m[r+1,c]+m[r+1,c-1]+m[r,c-1]+m[r-1,c-1]+m[r-1,c]+m[r-1,c+1])

end

I didn’t check your code, but I had also problems with @spawn.
The best way to solve it for me was to use spawnbg from the package ThreadPools.