I am trying to run something in parallel. Suppose I have the following module loaded and that I am using ClusterManagers.jl
to connect to a cluster.
@everywhere module ldf
using some packages...
function read_df()
validstates = ("...")
#create a return vector
df_of_states = Array{DataFrame, 1}(undef, 50)
for (i, vs) in enumerate(validstates)
df_of_states[i] = CSV.File(fn, header=false) |> DataFrame
end
return df_of_states
end
export read_df
const dfs = read_df()
export dfs
end
The line const dfs = read_df()
makes it so that when the @everywhere
is run on all the workers, the csv files are read and stored in a const
variable. I want to make use these dataframes in other functions. The size of each dataframes is 20000 x 141
.
So now suppose on node001 where there are 32 workers launched, each worker with a separate copy of ldf
, each creating a variable called dfs
and reading those files… I get the following error
On worker 2:
│ SystemError: memory mapping failed: Too many open files in system
│ #systemerror#51 at ./error.jl:168
│ #systemerror#50 at ./error.jl:167 [inlined]
│ systemerror at ./error.jl:167 [inlined]
│ #mmap#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:209
│ #mmap#14 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
│ mmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Mmap/src/Mmap.jl:251 [inlined]
│ parsetape at /home/affans/.julia/packages/CSV/vyG0T/src/file.jl:474
│ #File#28 at /home/affans/.julia/packages/CSV/vyG0T/src/file.jl:252
│ read_df at /home/affans/lancetid_actualinfection_post.jl:33
│ top-level scope at /home/affans/lancetid_actualinfection_post.jl:39
│ eval at ./boot.jl:331
It’s a fairly obvious error… I am opening up too many files, but I was wondering what the root cause of this is? It’s not out of memory error (as each worker has almost 4gb of RAM to work with, with a total of 128gb).
The solution for me here is to move the read_df()
back to the head node and just pass to the workers the relevant data… but just wanted to get some insight on this.