Hi, I have been running code in parallel using julia -p200
for a while. Now I am using a cloud server that has:
CPU(s) 96
Cores Per Socket 24
Sockets 2
By my calculations I should be able to run 96x24x2=4608 threads. However when trying to run something like julia -p750
(I can run julia -p500
) for example I get the following error:
brett_israelsen@instance-1:~/GitProjects/self_confidence/road_net$ julia -p4600
ERROR (unhandled task failure): pipe_link: too many open files (EMFILE)
Stacktrace:
[1] setup_stdio(::Base.##374#375{Cmd}, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:497
[2] #spawn#373(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:511
[3] (::Base.#kw##spawn)(::Array{Any,1}, ::Base.#spawn, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./<missing>:0
[4] #spawn#370(::Nullable{Base.ProcessChain}, ::Function, ::Base.CmdRedirect, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:392
[5] spawn(::Base.CmdRedirect, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:392
[6] (::Base.Distributed.##31#34{Base.Distributed.LocalManager,Dict{Any,Any},Array{WorkerConfig,1},Condition})() at ./event.jl:73
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Worker 465 terminated.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
ERROR (unhandled task failure): Version read failed. Connection closed by peer.
Stacktrace:
[1] (::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
Worker 464 terminated.
ERROR (unhandled task failure): Version read failed. Connection closed by peer.
Stacktrace:
[1] (::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
.
.
.
I suppose that since I am fairly new to HPC, I may have unrealistic expectations, or could be doing something wrong. Hopefully someone can help me out, I would like to be able to use all of the available threads and am not sure how, or if I am already. Thanks!