HI, Community!
I faced with strange problem. It is probably a memory leak on worker process.
I use command for running instance, like this:
/opt/julia-1.1.0/bin/julia --project=/opt/seismo-api-gm-concave-module.jl --load=/opt/seismo-api-gm-concave-module.jl/conf/seismo-api-gm-concave-module.conf --procs=3 /opt/seismo-api-gm-concave-module.jl/runservice.jl
Environment:
julia version 1.1.0
Linux gmm-1 4.4.0-141-generic #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 16.04.5 LTS
After start I see 4 process: 1 main proc, and 3 workers. Workers start automatically by this command (ps auxf
):
/opt/julia-1.1.0/bin/julia -Cnative -J/opt/julia-1.1.0/lib/julia/sys.so -g1 --bind-to 127.0.0.1 --worker
The main process initiate processing jobs from RemoteChannel
like it described in docs:
const jobs = RemoteChannel(()->Channel{Any}(10000));
@everywhere function do_work(jobs) # define work function everywhere
while true
event_set = take!(jobs)
try
result = pga_concave_hull_proc(event_set["event_id"],
event_set["model"],
event_set["config"])
@info "POSTing data OK $(event_set["event_id"]) at $(now()) on worker $(myid()) and pid $(getpid())" result
catch err
@warn err
@warn "ERROR occured while processing ** $(event_set["event_id"]) ** at $(now()) on worker $(myid()) and pid $(getpid())"
finally
@info "perform garbidge collection" GC.gc() InteractiveUtils.varinfo()
end
end
end
When it starts it takes only 1,5% of memory on every worker. These are modules, global constants, some functions propagated to workers. Letβs see the varinfo()
on worker that run inside a do_work(jobs)
function:
Feb 14 13:08:20 gmm-1 julia[6750]: β Info: perform garbidge collection
Feb 14 13:08:20 gmm-1 julia[6750]: β GC.gc() = nothing
Feb 14 13:08:20 gmm-1 julia[6750]: β InteractiveUtils.varinfo() =
Feb 14 13:08:20 gmm-1 julia[6750]: β name size summary
Feb 14 13:08:20 gmm-1 julia[6750]: β βββββββββββββββββββββββββββββββββ βββββββββ ββββββββββββββββββββββββββββββββββββββ
Feb 14 13:08:20 gmm-1 julia[6750]: β API_TOKEN 1.150 KiB String
Feb 14 13:08:20 gmm-1 julia[6750]: β Base Module
Feb 14 13:08:20 gmm-1 julia[6750]: β Core Module
Feb 14 13:08:20 gmm-1 julia[6750]: β DB 59 bytes SQLite.DB
Feb 14 13:08:20 gmm-1 julia[6750]: β Distributed 1.243 MiB Module
Feb 14 13:08:20 gmm-1 julia[6750]: β GET_REPORT_URL 51 bytes String
Feb 14 13:08:20 gmm-1 julia[6750]: β HEADERS_BASE 1.666 KiB Dict{String,String} with 2 entries
Feb 14 13:08:20 gmm-1 julia[6750]: β IP_ADDRESS 19 bytes String
Feb 14 13:08:20 gmm-1 julia[6750]: β MAX_LAT_GRID 8 bytes Int64
Feb 14 13:08:20 gmm-1 julia[6750]: β MAX_LON_GRID 8 bytes Int64
Feb 14 13:08:20 gmm-1 julia[6750]: β Main Module
Feb 14 13:08:20 gmm-1 julia[6750]: β PGA_INTERVALS 112 bytes 9-element Array{Float64,1}
Feb 14 13:08:20 gmm-1 julia[6750]: β PORT 8 bytes Int64
Feb 14 13:08:20 gmm-1 julia[6750]: β POST_PGA_CONCAVE_URL 74 bytes String
Feb 14 13:08:20 gmm-1 julia[6750]: β concave_hull_with_log 0 bytes typeof(concave_hull_with_log)
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_pga 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_pga_2 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_pgv 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_psa_03 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_psa_10 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_crustal_psa_30 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_interplate_pga 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_interplate_pgv 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_interplate_psa_03 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_interplate_psa_10 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_interplate_psa_30 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_pga 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_pga_asid 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_pgv 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_psa_03 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_psa_10 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β config_mf2013_intraplate_psa_30 139 bytes Params_mf2013
Feb 14 13:08:20 gmm-1 julia[6750]: β do_work 0 bytes typeof(do_work)
Feb 14 13:08:20 gmm-1 julia[6750]: β get_grid 0 bytes typeof(get_grid)
Feb 14 13:08:20 gmm-1 julia[6750]: β json_concave_data 0 bytes typeof(json_concave_data)
Feb 14 13:08:20 gmm-1 julia[6750]: β parse_loc_values 0 bytes typeof(parse_loc_values)
Feb 14 13:08:20 gmm-1 julia[6750]: β parse_report_version_timestamp 0 bytes typeof(parse_report_version_timestamp)
Feb 14 13:08:20 gmm-1 julia[6750]: β pga_concave_hull_proc 0 bytes typeof(pga_concave_hull_proc)
Feb 14 13:08:20 gmm-1 julia[6750]: β post_pga_concave 0 bytes typeof(post_pga_concave)
Feb 14 13:08:20 gmm-1 julia[6750]: β project_dir 44 bytes String
and this varinfo()
output does not change even after 1000 jobs! Because all work performs inside a functions and global objects does not created besides initial global constants. So varinfo()
output is identical after any times to run.
But the memory of each process is growing. I observe such a situation:
- one process killed by linux oom_killer.
- two workers take ~70.0% of system memory:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
juliaus+ 6750 0.0 1.5 1115920 256216 ? Ssl Feb13 0:31 /opt/julia-1.1.0/bin/julia --project=/opt/seismo-api-gm-concave-module.jl --load=/opt/seismo-api-gm-concave-mod
juliaus+ 6756 2.0 36.4 7476288 5984652 ? Ssl Feb13 91:55 \_ /opt/julia-1.1.0/bin/julia -Cnative -J/opt/julia-1.1.0/lib/julia/sys.so -g1 --bind-to 127.0.0.1 --worker
juliaus+ 6759 2.0 36.3 7317696 5978164 ? Ssl Feb13 91:46 \_ /opt/julia-1.1.0/bin/julia -Cnative -J/opt/julia-1.1.0/lib/julia/sys.so -g1 --bind-to 127.0.0.1 --worker
Why workers take so much memory after a lot of jobs? And why varinfo()
does not show objects that takes so much memory?