Something strange is happening to the Windows GitHub actions for SymbolicRegression.jl - which is either caused by or manifesting in a Distributed.jl error
The action was working 9 days ago, for all operating systems:
Today, I noticed my windows runs were breaking. The error doesn’t seem to go away with new tweaks. Confused, I re-run the last working commit with the exact same action, and those Windows runs now break!
Then, I thought: perhaps this is just the new patch of windows-latest
breaking something? So I tried windows-2019
and windows-2022
. Also now broken!
The weird thing is that Julia 1.5 is still working, but Julia 1.6 through 1.8 are all broken, on all versions of Windows, despite the code not changing at all.
Perhaps this is either:
- New GitHub action compute hardware, which breaks newer versions of Distributed.jl on windows?
- New patches of
windows-2019
andwindows-2022
, which break newer versions of Distributed.jl? - One of my dependencies’ updates introducing a bug, even though the issue seems to be coming from Distributed.jl which in the standard library?
Any ideas or things to try greatly appreciated!
This is the specific error (this run):
Unhandled Task ERROR: IOError: read: connection reset by peer (ECONNRESET)
Stacktrace:
[1] wait_readnb(x::Sockets.TCPSocket, nb::Int64)
@ Base .\stream.jl:410
[2] (::Base.var"#wait_locked#679")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base .\stream.jl:944
[3] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base .\stream.jl:950
[4] unsafe_read
@ .\io.jl:759 [inlined]
[5] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base .\io.jl:758
[6] read!
@ .\io.jl:760 [inlined]
[7] deserialize_hdr_raw
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\messages.jl:167 [inlined]
[8] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:172
[9] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:133
[10] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed .\task.jl:484
Error During Test
at Worker 2 terminated.Unhandled Task ERROR: IOError: read: connection reset by peer (ECONNRESET)
Stacktrace:
[1] wait_readnb(x::Sockets.TCPSocket, nb::Int64)
@ Base .\stream.jl:410
[2] (::Base.var"#wait_locked#679")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base .\stream.jl:944
[3] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base .\stream.jl:950
[4] unsafe_read
@ .\io.jl:759 [inlined]
[5] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base .\io.jl:758
[6] read!
@ .\io.jl:760 [inlined]
[7] deserialize_hdr_raw
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\messages.jl:167 [inlined]
[8] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:172
[9] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:133
[10] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed .\task.jl:484
C:\Users\runneradmin\.julia\packages\SafeTestsets\A83XK\src\SafeTestsets.jl:25
Got exception outside of a @testWorker 5 terminated.Unhandled Task ERROR: IOError: read: connection reset by peer (ECONNRESET)
Stacktrace:
[1] wait_readnb(x::Sockets.TCPSocket, nb::Int64)
@ Base .\stream.jl:410
[2] (::Base.var"#wait_locked#679")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base .\stream.jl:944
[3] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base .\stream.jl:950
[4] unsafe_read
@ .\io.jl:759 [inlined]
[5] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base .\io.jl:758
[6] read!
@ .\io.jl:760 [inlined]
[7] deserialize_hdr_raw
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\messages.jl:167 [inlined]
[8] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:172
[9] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:133
[10] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed .\task.jl:484
LoadError: Distributed.ProcessExitedException(2)Unhandled Task ERROR: IOError: read: connection reset by peer (ECONNRESET)
Stacktrace:
[1] wait_readnb(x::Sockets.TCPSocket, nb::Int64)
@ Base .\stream.jl:410
[2] (::Base.var"#wait_locked#679")(s::Sockets.TCPSocket, buf::IOBuffer, nb::Int64)
@ Base .\stream.jl:944
[3] unsafe_read(s::Sockets.TCPSocket, p::Ptr{UInt8}, nb::UInt64)
@ Base .\stream.jl:950
[4] unsafe_read
@ .\io.jl:759 [inlined]
[5] unsafe_read(s::Sockets.TCPSocket, p::Base.RefValue{NTuple{4, Int64}}, n::Int64)
@ Base .\io.jl:758
[6] read!
@ .\io.jl:760 [inlined]
[7] deserialize_hdr_raw
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\messages.jl:167 [inlined]
[8] message_handler_loop(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:172
[9] process_tcp_streams(r_stream::Sockets.TCPSocket, w_stream::Sockets.TCPSocket, incoming::Bool)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\process_messages.jl:133
[10] (::Distributed.var"#103#104"{Sockets.TCPSocket, Sockets.TCPSocket, Bool})()
@ Distributed .\task.jl:484
...and 3 more exceptions.
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base .\task.jl:436
[2] macro expansion
@ .\task.jl:455 [inlined]
[3] remotecall_eval(m::Module, procs::Vector{Int64}, ex::Expr)
@ Distributed C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\macros.jl:219
[4] macro expansion
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Distributed\src\macros.jl:203 [inlined]
[5] import_module_on_workers(procs::Vector{Int64}, filename::String, options::SymbolicRegression.CoreModule.OptionsStructModule.Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos)}, Nothing, Nothing, LossFunctions.L2DistLoss, Int64})
@ SymbolicRegression D:\a\SymbolicRegression.jl\SymbolicRegression.jl\src\Configure.jl:199
[6] _EquationSearch(::SymbolicRegression.CoreModule.ProgramConstantsModule.SRDistributed, datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}}; niterations::Int64, options::SymbolicRegression.CoreModule.OptionsStructModule.Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos)}, Nothing, Nothing, LossFunctions.L2DistLoss, Int64}, numprocs::Nothing, procs::Nothing, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
@ SymbolicRegression D:\a\SymbolicRegression.jl\SymbolicRegression.jl\src\SymbolicRegression.jl:513
[7] EquationSearch(datasets::Vector{SymbolicRegression.CoreModule.DatasetModule.Dataset{Float32}}; niterations::Int64, options::SymbolicRegression.CoreModule.OptionsStructModule.Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos)}, Nothing, Nothing, LossFunctions.L2DistLoss, Int64}, numprocs::Nothing, procs::Nothing, multithreading::Bool, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
@ SymbolicRegression D:\a\SymbolicRegression.jl\SymbolicRegression.jl\src\SymbolicRegression.jl:327
[8] EquationSearch(X::Matrix{Float32}, y::LinearAlgebra.Transpose{Float32, Matrix{Float32}}; niterations::Int64, weights::Nothing, varMap::Nothing, options::SymbolicRegression.CoreModule.OptionsStructModule.Options{Tuple{typeof(+), typeof(*)}, Tuple{typeof(cos)}, Nothing, Nothing, LossFunctions.L2DistLoss, Int64}, numprocs::Nothing, procs::Nothing, multithreading::Bool, runtests::Bool, saved_state::Nothing, addprocs_function::Nothing)
@ SymbolicRegression D:\a\SymbolicRegression.jl\SymbolicRegression.jl\src\SymbolicRegression.jl:271
[9] top-level scope
@ D:\a\SymbolicRegression.jl\SymbolicRegression.jl\test\full.jl:104
[10] include(mod::Module, _path::String)
@ Base .\Base.jl:419
[11] include(x::String)
@ Main.var"##361" C:\Users\runneradmin\.julia\packages\SafeTestsets\A83XK\src\SafeTestsets.jl:23
[12] macro expansion
@ D:\a\SymbolicRegression.jl\SymbolicRegression.jl\test\runtests.jl:7 [inlined]
[13] macro expansion
@ C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Test\src\Test.jl:1357 [inlined]
[14] top-level scope
@ D:\a\SymbolicRegression.jl\SymbolicRegression.jl\test\runtests.jl:7
[15] eval(m::Module, e::Any)
@ Core .\boot.jl:368
[16] top-level scope
@ C:\Users\runneradmin\.julia\packages\SafeTestsets\A83XK\src\SafeTestsets.jl:23
[17] include(fname::String)
@ Base.MainInclude .\client.jl:476
[18] top-level scope
@ none:6
[19] eval
@ .\boot.jl:368 [inlined]
[20] exec_options(opts::Base.JLOptions)
@ Base .\client.jl:276
[21] _start()
@ Base .\client.jl:522
in expression starting at D:\a\SymbolicRegression.jl\SymbolicRegression.jl\test\full.jl:12
Test Summary: | Pass Error Total Time
End to end test | 2 1 3 1m51.5s
ERROR: LoadError: Some tests did not pass: 2 passed, 0 failed, 1 errored, 0 broken.
in expression starting at D:\a\SymbolicRegression.jl\SymbolicRegression.jl\test\runtests.jl:6
ERROR: Package SymbolicRegression errored during testing
Stacktrace:
[1] pkgerror(msg::String)
@ Pkg.Types C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Pkg\src\Types.jl:67
[2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
@ Pkg.Operations C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Pkg\src\Operations.jl:1813
[3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::Base.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
@ Pkg.API C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Pkg\src\API.jl:431
[4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:coverage,), Tuple{Bool}}})
@ Pkg.API C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Pkg\src\API.jl:156
[5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:coverage,), Tuple{Bool}}})
@ Pkg.API C:\hostedtoolcache\windows\julia\1.8.0\x64\share\julia\stdlib\v1.8\Pkg\src\API.jl:171
[6] top-level scope
@ none:1
This is the line of code which the workers are failing on: https://github.com/MilesCranmer/SymbolicRegression.jl/blob/284ec196702bb8d08769a58ac119cf14674b79a6/src/Configure.jl#L199-L201
@everywhere procs begin
Base.MainInclude.eval(using SymbolicRegression)
end
this line imports the package on all the workers (since addprocs
is called within the library, rather than having the user do it). I’ve never had an issue with this before - even have scaled up to 1000s of workers across a slurm cluster just fine.