I have a fairly complicated bit of code involving shared arrays. I’m having problems with a memory error although can’t seem to get a minimum working example. Pseudo-code of what I am doing follows:
# Set some parameter size_of_x
# Create object x here which is of type Vector{Vector{Float64}} with size controlled by size_of_x
for n = 1:N
# Do some work here to update x
# Create y which is of type Vector{SharedVector{Float64}} by transforming x
# Note, y is much smaller than x and the size of y is independent of size_of_x
@sync @distributed for k = 1:K
# Call some functions on y
end
end
When size_of_x is small, this code works fine and produces sensible results, and watching my system resources, appears to be using all available CPUs on the distributed loop. But when size_of_x is large, the first iteration of the outer loop works, but on the second iteration, I get the following error:
ERROR: On worker 3:
SystemError: memory mapping failed: Cannot allocate memory
#parse#338 at ./parse.jl:217
parse at ./parse.jl:217
print_shmem_limits at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/SharedArrays/src/SharedArrays.jl:614
shm_mmap_array at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/SharedArrays/src/SharedArrays.jl:641
#6 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/SharedArrays/src/SharedArrays.jl:128
#109 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:265
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:56
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Distributed/src/process_messages.jl:65
#102 at ./task.jl:262
While the loop is running, I keep an eye on my system resources using Ubuntu’s System Monitor, and RAM appears to go nowhere near the limits (I apparently don’t consume more than 25% of what is available).
I realize this is not a whole lot to go off, but I’m struggling to get a reproducible example, and was just mostly wondering if anyone knows what could be causing an error like this? Is the problem really something to do with system RAM, or is it some other type of memory limit here?
Note, I’ve read the section in the docs on parallel work (several times)
My reading of the code is that is shoudl print what the limits are.
I think you may have hit a bug!
Ah though - you are using version 0.7 We are told 0.7 and 1.0 should be equivalent.
Could you try version 1.0.3 maybe?
colin@colin-Z270-HD3:~$ ipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398509481980
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 32000
max semaphores per array = 32000
max semaphores system wide = 1024000000
max ops per semop call = 500
semaphore max value = 32767
Sorry, I should have mentioned I was still using v0.7 (I’m still getting used to the global scope REPL rules). Using v1.03 I’m getting the same error message (i.e. without the additional info that it looks like I should be getting). For the sake of completeness, the error message on v1.03 is:
ERROR: On worker 4:
SystemError: memory mapping failed: Cannot allocate memory
#parse#332 at ./parse.jl:228
parse at ./parse.jl:228
print_shmem_limits at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/SharedArrays/src/SharedArrays.jl:614
shm_mmap_array at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/SharedArrays/src/SharedArrays.jl:641
#6 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/SharedArrays/src/SharedArrays.jl:128
#109 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:265
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:56
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:65
#102 at ./task.jl:259
I messed around with cat /proc/meminfo during a run, and it seemed to agree with what I was seeing on the System Monitor GUI, i.e. I wasn’t anywhere near a RAM limit. This is what it looks like just after the error is thrown:
Can you tell me which version of Ubuntu and which kernel you hve (uname -r)
I have access to different Debian versions at work. I dont think ther kernel is relevant though, just asking for completeness. I can try to reproduce this.
Will check the kernel for you tomorrow when I get to work. In the meantime, it’s a fresh install of Ubuntu 18.04 LTS (I only made the bootable USB a week ago).
I’ll also have another go at making a MWE tomorrow too.