Weird UndefRefError in pmap since 1.8

Hey. I am kind of new to the language and my coding knowledge is very minimal. Therefore I am unable to solve (even understand) the problem I have right now.

My code is kind of large and convoluted and I don’t know how to really extract a minimal example. I cannot even reproduce the error in the full code reliably.

I use something like this in my code:

function fill_arr1!(arr1::Array,parameter::Float64,d::Int64,L::Int64)

    #allocate buffers
    arr2=zeros(d)
    arr3=zeros(d)

   #execute routines with pmap
   result1=pmap( idx -> routine1(idx, parameter,arr2,arr3), 1:L)
   result2=pmap( idx -> routine2(idx, parameter,arr2,arr3), 1:L)

   return result1,result2
end

This function is part of a deq-solver and is used once per step. Sometimes the whole code works without error and sometimes at some random iteration step I get the following error (I omitted the last line which basically says that the error happens in the function I tried to paraphrase above):

ERROR: On worker 2:
UndefRefError: access to undefined reference
Stacktrace:
  [1] getindex
    @ ./array.jl:924 [inlined]
  [2] getindex
    @ ./abstractarray.jl:1244 [inlined]
  [3] desertag
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:100 [inlined]
  [4] handle_deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:851
  [5] deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1479
  [6] handle_deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:877
  [7] deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1479
  [8] handle_deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:877
  [9] deserialize
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
 [10] deserialize_msg
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
 [11] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [12] invokelatest
    @ ./essentials.jl:726 [inlined]
 [13] message_handler_loop
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [14] process_tcp_streams
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [15] #103
    @ ./task.jl:484
Stacktrace:
  [1] (::Base.var"#939#941")(x::Task)
    @ Base ./asyncmap.jl:177
  [2] foreach(f::Base.var"#939#941", itr::Vector{Any})
    @ Base ./abstractarray.jl:2774
  [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c::UnitRange{Int64})
    @ Base ./asyncmap.jl:177
  [4] wrap_n_exec_twice
    @ ./asyncmap.jl:153 [inlined]
  [5] #async_usemap#924
    @ ./asyncmap.jl:103 [inlined]
  [6] #asyncmap#923
    @ ./asyncmap.jl:81 [inlined]
  [7] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
    @ Distributed /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/pmap.jl:126
  [8] pmap
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/pmap.jl:99 [inlined]
  [9] #pmap#233
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/pmap.jl:156 [inlined]
 [10] pmap
    @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/pmap.jl:156 [inlined]

Sorry, cannot make it more concrete than this. Anyway: on my old machine with windows and Julia 1.7. this worked out flawlessly. Now I am working with a new machine (MacBook) and Julia 1.8. and getting this error sometimes.

For someone with little experience this error message is really hard to comprehend, but I figure that some of the inputs in the anonymous functions include something like an undef value, but I simply cannot locate this behavior in the code. Also that the code ran perfectly well before makes me believe that this actually has to do something with the newer Julia version and not with my code. Strangely, when I replace the two pmaps with just simple loops over all desired indices, the code also works without an error.

Hi, I know this is almost a year old at this point, but did you ever figure out what was going on? I’m getting a similar error now too. I also have a pretty large and convoluted code - I’ve tried to make a minimum working example, but for some reason from what I’ve tried so far I can’t reproduce the error. I tried matching the conditions of my code as closely as possible, but it’s weird because even when I’m running my main code, the error is inconsistent. Sometimes it will happen, sometimes it’s a completely different error message, and sometimes it will work flawlessly without any errors – and this is all without me changing a single line of code, just running the script twice in a row produces different results. I had also been running this code flawlessly for months until I recently upgraded to macOS Sonoma a few days ago, so I thought that might be the issue, but seeing as you were getting a similar error many months ago it seems like that’s not the case. I didn’t update my Julia version or anything either, I’ve been using 1.9 for months as well. It’s just such a bizarre issue because the errors are inconsistent. I know the problem is with the use of the pmap function, because if I do an @distributed for loop it also works perfectly, but I haven’t been able to find anything more specific than that.

Thanks, sorry for the long message!

We really need a minimum working example ro figure out what is happening. That is we need code that we can run to reproduce the problem.

Speculating wildly, sounds like there’s some uninitialised memory somewhere (e.g. through similar calls) that’s being used before being written to. This will produce different results on each run, and might potentially lead to undef ref errors if it’s a mutable struct being referenced.

I suspect a race condition due to improper partitioning, serialization, and deserialization. It’s really hard to tell without executable code though.

Ok, so I have been able to make a “heavily reduced” working example, but I realize it’s still a bit big for a “minimum working example” - my apologies for that, but if I reduced it any further then the error stopped appearing. You’ll notice in particular that there are a lot of packages being imported that are not used…I tried the same code without importing these packages and it seems to work pretty consistently, which is even more puzzling to me considering that I don’t use them at all? Anyways, here’s the example:

I have a package file which looks like this:

module JuliaBug

# Importing all of the dependencies

# Parallel computing packages
using Distributed
using SharedArrays

# Math packages
using Distributions
using Random
using Statistics
using StatsBase
using NaNStatistics
using QuadGK
using NumericalIntegration
using Dierckx
using LinearAlgebra
using FFTW
using SpecialFunctions
using Polynomials
using NLsolve
using ImageFiltering
using ImageTransformations

# Optimization packages
using Optim
using CMPFit

# Astronomy packages
using AstroLib
using FITSIO
using Photometry
using Cosmology
using AstroAngles
using SkyCoords
using WCS
using Reproject
using Unitful, UnitfulAstro

# File I/O
using Glob
using TOML
using DelimitedFiles
using CSV
using Serialization
using DataFrames

# Plotting packages
using PlotlyJS

# Misc packages/utilites
using ProgressMeter
using Printf
using Logging
using LoggingExtras
using Dates
using InteractiveUtils
using ColorSchemes
using LaTeXStrings


export Data

mutable struct Data

    λ::Vector{<:Real}
    I::Array{<:Real,3}
    σ::Array{<:Real,3}

end


end

And then a script which uses the module which looks like this:

using Distributed
procs = addprocs(4)

@everywhere using JuliaBug

@everywhere data = Data(
    rand(6705),
    rand(43,43,6705),
    rand(43,43,6705)
)

result = pmap(1:100) do i
    data.I[1] * i
end
println(result)

rmprocs(procs)

I would run this script as julia --project=. script.jl from within the project folder for the above module.

Running this script 4 times in a row without making any changes produced 2 different errors and then the fourth time it magically worked:

julia --project=. test_script.jl
ERROR: LoadError: On worker 2:
MethodError: Cannot `convert` an object of type Symbol to an object of type Tuple

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Tuple
   @ Base essentials.jl:411
  convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple}
   @ Base essentials.jl:412
  convert(::Type{T}, ::CartesianIndex) where T<:Tuple
   @ Base multidimensional.jl:128
  ...

Stacktrace:
 [1] CallMsg
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/messages.jl:27
 [2] deserialize_msg
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/messages.jl:87
 [3] #invokelatest#2
   @ ./essentials.jl:819 [inlined]
 [4] invokelatest
   @ ./essentials.jl:816 [inlined]
 [5] message_handler_loop
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:176
 [6] process_tcp_streams
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133
 [7] #103
   @ ./task.jl:514
Stacktrace:
  [1] (::Base.var"#988#990")(x::Task)
    @ Base ./asyncmap.jl:177
  [2] foreach(f::Base.var"#988#990", itr::Vector{Any})
    @ Base ./abstractarray.jl:3075
  [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c::UnitRange{Int64})
    @ Base ./asyncmap.jl:177
  [4] wrap_n_exec_twice
    @ ./asyncmap.jl:153 [inlined]
  [5] #async_usemap#973
    @ ./asyncmap.jl:103 [inlined]
  [6] async_usemap
    @ ./asyncmap.jl:84 [inlined]
  [7] #asyncmap#972
    @ ./asyncmap.jl:81 [inlined]
  [8] asyncmap
    @ ./asyncmap.jl:80 [inlined]
  [9] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:126
 [10] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:99
 [11] pmap(f::Function, c::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [12] pmap(f::Function, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [13] top-level scope
    @ ~/Dropbox/JuliaBug/test_script.jl:12
in expression starting at /Users/mreefe/Dropbox/JuliaBug/test_script.jl:12
julia --project=. test_script.jl
ERROR: LoadError: On worker 2:
Inconsistent Serializer state when deserializing.
    Attempt to access internal table with key 2127151348 failed.

    This might occur if the Serializer contexts when serializing and deserializing are inconsistent.
    In particular, if multiple serialize calls use the same Serializer object then
    the corresponding deserialize calls should also use the same Serializer object.

Stacktrace:
  [1] error
    @ ./error.jl:35
  [2] #3
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Serialization/src/Serialization.jl:845
  [3] get
    @ ./iddict.jl:169 [inlined]
  [4] gettable
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Serialization/src/Serialization.jl:837 [inlined]
  [5] handle_deserialize
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Serialization/src/Serialization.jl:865
  [6] deserialize
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Serialization/src/Serialization.jl:816 [inlined]
  [7] deserialize_msg
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/messages.jl:87
  [8] #invokelatest#2
    @ ./essentials.jl:819 [inlined]
  [9] invokelatest
    @ ./essentials.jl:816 [inlined]
 [10] message_handler_loop
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:176
 [11] process_tcp_streams
    @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133
 [12] #103
    @ ./task.jl:514
Stacktrace:
  [1] (::Base.var"#988#990")(x::Task)
    @ Base ./asyncmap.jl:177
  [2] foreach(f::Base.var"#988#990", itr::Vector{Any})
    @ Base ./abstractarray.jl:3075
  [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c::UnitRange{Int64})
    @ Base ./asyncmap.jl:177
  [4] wrap_n_exec_twice
    @ ./asyncmap.jl:153 [inlined]
  [5] #async_usemap#973
    @ ./asyncmap.jl:103 [inlined]
  [6] async_usemap
    @ ./asyncmap.jl:84 [inlined]
  [7] #asyncmap#972
    @ ./asyncmap.jl:81 [inlined]
  [8] asyncmap
    @ ./asyncmap.jl:80 [inlined]
  [9] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:126
 [10] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:99
 [11] pmap(f::Function, c::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [12] pmap(f::Function, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [13] top-level scope
    @ ~/Dropbox/JuliaBug/test_script.jl:12
in expression starting at /Users/mreefe/Dropbox/JuliaBug/test_script.jl:12
julia --project=. test_script.jl
ERROR: LoadError: On worker 3:
MethodError: Cannot `convert` an object of type Symbol to an object of type Tuple

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Tuple
   @ Base essentials.jl:411
  convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple}
   @ Base essentials.jl:412
  convert(::Type{T}, ::CartesianIndex) where T<:Tuple
   @ Base multidimensional.jl:128
  ...

Stacktrace:
 [1] CallMsg
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/messages.jl:27
 [2] deserialize_msg
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/messages.jl:87
 [3] #invokelatest#2
   @ ./essentials.jl:819 [inlined]
 [4] invokelatest
   @ ./essentials.jl:816 [inlined]
 [5] message_handler_loop
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:176
 [6] process_tcp_streams
   @ /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/process_messages.jl:133
 [7] #103
   @ ./task.jl:514
Stacktrace:
  [1] (::Base.var"#988#990")(x::Task)
    @ Base ./asyncmap.jl:177
  [2] foreach(f::Base.var"#988#990", itr::Vector{Any})
    @ Base ./abstractarray.jl:3075
  [3] maptwice(wrapped_f::Function, chnl::Channel{Any}, worker_tasks::Vector{Any}, c::UnitRange{Int64})
    @ Base ./asyncmap.jl:177
  [4] wrap_n_exec_twice
    @ ./asyncmap.jl:153 [inlined]
  [5] #async_usemap#973
    @ ./asyncmap.jl:103 [inlined]
  [6] async_usemap
    @ ./asyncmap.jl:84 [inlined]
  [7] #asyncmap#972
    @ ./asyncmap.jl:81 [inlined]
  [8] asyncmap
    @ ./asyncmap.jl:80 [inlined]
  [9] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64}; distributed::Bool, batch_size::Int64, on_error::Nothing, retry_delays::Vector{Any}, retry_check::Nothing)
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:126
 [10] pmap(f::Function, p::WorkerPool, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:99
 [11] pmap(f::Function, c::UnitRange{Int64}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [12] pmap(f::Function, c::UnitRange{Int64})
    @ Distributed /Applications/Julia-1.9.app/Contents/Resources/julia/share/julia/stdlib/v1.9/Distributed/src/pmap.jl:156
 [13] top-level scope
    @ ~/Dropbox/JuliaBug/test_script.jl:12
in expression starting at /Users/mreefe/Dropbox/JuliaBug/test_script.jl:12
julia --project=. test_script.jl
[0.04180670359355998, 0.08361340718711996, 0.12542011078067994, 0.16722681437423992, 0.2090335179677999, 0.2508402215613599, 0.29264692515491986, 0.33445362874847984, 0.3762603323420398, 0.4180670359355998, 0.4598737395291598, 0.5016804431227198, 0.5434871467162797, 0.5852938503098397, 0.6271005539033997, 0.6689072574969597, 0.7107139610905197, 0.7525206646840796, 0.7943273682776396, 0.8361340718711996, 0.8779407754647596, 0.9197474790583196, 0.9615541826518795, 1.0033608862454395, 1.0451675898389996, 1.0869742934325595, 1.1287809970261193, 1.1705877006196794, 1.2123944042132395, 1.2542011078067994, 1.2960078114003593, 1.3378145149939193, 1.3796212185874794, 1.4214279221810393, 1.4632346257745992, 1.5050413293681593, 1.5468480329617194, 1.5886547365552792, 1.630461440148839, 1.6722681437423992, 1.7140748473359593, 1.7558815509295191, 1.797688254523079, 1.839494958116639, 1.8813016617101992, 1.923108365303759, 1.964915068897319, 2.006721772490879, 2.048528476084439, 2.090335179677999, 2.132141883271559, 2.173948586865119, 2.215755290458679, 2.2575619940522387, 2.2993686976457988, 2.341175401239359, 2.382982104832919, 2.424788808426479, 2.4665955120200387, 2.5084022156135988, 2.550208919207159, 2.5920156228007185, 2.6338223263942786, 2.6756290299878387, 2.717435733581399, 2.759242437174959, 2.8010491407685185, 2.8428558443620786, 2.8846625479556387, 2.9264692515491983, 2.9682759551427584, 3.0100826587363185, 3.0518893623298786, 3.0936960659234387, 3.1355027695169984, 3.1773094731105584, 3.2191161767041185, 3.260922880297678, 3.3027295838912383, 3.3445362874847984, 3.3863429910783585, 3.4281496946719185, 3.469956398265478, 3.5117631018590383, 3.5535698054525984, 3.595376509046158, 3.637183212639718, 3.678989916233278, 3.7207966198268383, 3.7626033234203984, 3.804410027013958, 3.846216730607518, 3.888023434201078, 3.929830137794638, 3.971636841388198, 4.013443544981758, 4.055250248575318, 4.097056952168878, 4.138863655762438, 4.180670359355998]

I also realize that none of the errors here are the same as the original UndefRefError from before. I was getting that error in my main code but for some reason now with this example I’m getting a bunch of other errors, but not that one…

It is interesting that you mentioned referencing a mutable struct, because that is something I’m doing, but I’m not writing to it during the pmap, just reading from it. And all of the data should be there before the pmap starts. Would that make a difference? I did try substituting the pmap with a more simple one where it just returns i for each iteration without referencing the data struct, and doing that it seems to work every time, but I would like to be able to reference a struct within the pmap if possible.

Sorry again for the bloated example!