Memory Usage and SharedArrays

Hi all,

I am new to parallel computing and I had a question regarding memory usage with SharedArrays. I am doing parallel computing to do one operation in each worker and write on a particular slice of an Array.

If I am correct, the way that SharedArrays works is that it puts the Array in each worker, but what if I wanted something that only gives access to the worker to the particular slice of the array in which I want it to write the new information. I get the feeling that my Array is large but not immense and that I am overloading the memory.

Consider the following MWE

using Distributed
addprocs(8)
using SharedArrays


@everywhere function compute(vec)
    vec ./10 .+rand()
    return vec
end



function innerloop(S)
    @everywhere GC.gc()
    V=rand(100,10,S*1136)
    s = SharedArray(zeros(100,10,S*1136))
    @sync @distributed for i in 1:S*1136
        s[:,:,i] = compute(V[:,:,i])
    end
    s
end

function outerloop(N,S)
    x=zeros(N)
    for n in 1:N
        x[n]=maximum(innerloop(S))
    end
    x
end


myres = outerloop(3,100);

I get the following error

schedule: Task not runnable
error(::String) at .\error.jl:33
enq_work(::Task) at .\task.jl:411
schedule at .\task.jl:426 [inlined]
uv_writecb_task(::Ptr{Nothing}, ::Int32) at .\stream.jl:985
poptaskref(::Base.InvasiveLinkedListSynchronized{Task}) at .\task.jl:564
wait() at .\task.jl:591
uv_write(::Sockets.TCPSocket, ::Ptr{UInt8}, ::UInt64) at .\stream.jl:883
unsafe_write(::Sockets.TCPSocket, ::Ptr{UInt8}, ::UInt64) at .\stream.jl:941
unsafe_write at .\io.jl:522 [inlined]
macro expansion at .\gcutils.jl:87 [inlined]
write at .\io.jl:545 [inlined]
serialize_array_data at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:246 [inlined]
serialize(::Distributed.ClusterSerializer{Sockets.TCPSocket}, ::Array{Float64,3}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:263
serialize_any(::Distributed.ClusterSerializer{Sockets.TCPSocket}, ::Any) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:629
serialize(::Distributed.ClusterSerializer{Sockets.TCPSocket}, ::Any) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:608
serialize_any(::Distributed.ClusterSerializer{Sockets.TCPSocket}, ::Any) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:629
serialize at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:608 [inlined]
serialize_msg(::Distributed.ClusterSerializer{Sockets.TCPSocket}, ::Distributed.CallMsg{:call}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\messages.jl:90
#invokelatest#1 at .\essentials.jl:790 [inlined]
invokelatest at .\essentials.jl:789 [inlined]
send_msg_(::Distributed.Worker, ::Distributed.MsgHeader, ::Distributed.CallMsg{:call}, ::Bool) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\messages.jl:185
#remotecall#146 at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\messages.jl:134 [inlined]
remotecall(::Function, ::Distributed.Worker) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\remotecall.jl:349
#remotecall#147(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(remotecall), ::Function, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\remotecall.jl:361
remotecall at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\remotecall.jl:361 [inlined]
spawnat(::Int64, ::Function) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\macros.jl:15
spawn_somewhere(::Function) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\macros.jl:17
macro expansion at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\macros.jl:46 [inlined]
(::getfield(Distributed, Symbol("##167#169")){getfield(Main, Symbol("##15#16")){Array{Float64,3},SharedArray{Float64,3}},UnitRange{Int64}})() at .\task.jl:253
sync_end(::Array{Any,1}) at task.jl:235
macro expansion at task.jl:254 [inlined]
innerloop(::Int64) at sa.jl:17
outerloop(::Int64, ::Int64) at sa.jl:26
top-level scope at sa.jl:32

and the following message in my REPL

WARNING: Workqueue inconsistency detected: popfirst!(Workqueue).state != :runnable
julia>       From worker 2:
      From worker 2:    Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
      From worker 2:    Exception: EXCEPTION_ACCESS_VIOLATION at 0x6b5d7b45 -- jl_assign_bits at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\datatype.c:610 [inlined]
      From worker 2:    jl_set_nth_field at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\datatype.c:977
      From worker 2:    in expression starting at none:0
      From worker 2:    jl_set_nth_field at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\datatype.c:968
      From worker 2:    deserialize at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:1292
      From worker 2:    jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2191
      From worker 2:    handle_deserialize at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:786
      From worker 2:    deserialize_msg at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Serialization\src\Serialization.jl:722
      From worker 2:    jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2191
      From worker 2:    jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\julia.h:1614 [inlined]
      From worker 2:    jl_f__apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\builtins.c:563
      From worker 2:    jl_f__apply_latest at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\builtins.c:601
      From worker 2:    #invokelatest#1 at .\essentials.jl:790 [inlined]
      From worker 2:    invokelatest at .\essentials.jl:789 [inlined]
      From worker 2:    message_handler_loop at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\process_messages.jl:183
      From worker 2:    process_tcp_streams at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.2\Distributed\src\process_messages.jl:140
      From worker 2:    #105 at .\task.jl:268
      From worker 2:    jl_apply_generic at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\gf.c:2197
      From worker 2:    jl_apply at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\julia.h:1614 [inlined]
      From worker 2:    start_task at /home/Administrator/buildbot/worker/package_win64/build/src/home/Administrator/buildbot/worker/package_win64/build/src\task.c:596
      From worker 2:    Allocations: 4199815 (Pool: 4136163; Big: 63652); GC: 45

Bumping this post. Do people have any suggestion for this issue?

I tried the MWE without the GC.gc() line (don’t know why) and get the same issue with Julia 1.2

Using Julia 1.3, after a while of DOS (Denial Of Service), it finished without any problem.

Could you try with 1.3?

1 Like

I just tried with Julia 1.3 and it works fine. However, if I increase the dimensions by increasing S I soon run into a OutofMemory() error. It seems to me that since I only want my workers to write on a specific line of the Array, there must be some workaround that I am not using.

Hi! Can someone help me with this please?

You colud preallocate your large arrays V and s, like:

using Distributed
addprocs(8)
using SharedArrays


@everywhere function compute(vec)
    vec ./10 .+rand()
    return vec
end



function innerloop(S,V,s)
    #@everywhere GC.gc()
    #V=rand(100,10,S*1136)
    #s = SharedArray(zeros(100,10,S*1136))
    for index in eachindex(V) V[index]=rand() end
    @sync @distributed for i in 1:S*1136
        s[:,:,i] = compute(V[:,:,i])
    end
    s
end

function outerloop(N,S,V,s)
    x=zeros(N)
    for n in 1:N
        x[n]=maximum(innerloop(S,V,s))
    end
    x
end

S=100
V=rand(100,10,S*1136);
s = SharedArray(zeros(100,10,S*1136));
myres = outerloop(3,10,V,s);

But at some value for S you will run out of memory.

And you don’t need V:

using Distributed
addprocs(8)
using SharedArrays


@everywhere function compute(vec)
    vec ./10 .+rand()
    return vec
end



function innerloop(S,s)
    #@everywhere GC.gc()
    #V=rand(100,10,S*1136)
    #s = SharedArray(zeros(100,10,S*1136))
	for index in eachindex(s) s[index]=rand() end
    @sync @distributed for i in 1:S*1136
        s[:,:,i] = compute(s[:,:,i])
    end
    s
end

function outerloop(N,S,s)
    x=zeros(N)
    for n in 1:N
        x[n]=maximum(innerloop(S,s))
    end
    x
end

S=100
s = SharedArray(zeros(100,10,S*1136));
myres = outerloop(3,10,s);

But this may be just because of the MWE.

1 Like

Thanks. Preallocating the memory did solve he issue.

I think the issue is V being copied to each worker in the innerloop due to not being a SharedArray.

I can see memory spike in workers with V as an array, but not with a shared array.