I want do some machine learning using
MLJ on a cluster of 200+ nodes. To hook up my workers to the main process I want to use an
ElasticManager from the
ClusterManagers package (this way I can also utilize workers which come late because they had to queue in our queueing system).
I usually initiate my cluster workers and let the worker processes execute some code like
using ClusterManagers using XYZ elastic_worker(cookie, ip, port)
port have the correct values.
In general this scheme of “using-first-then-connect-worker” works and make the package XYZ available on the worker - this way I do not have to call an
@everywhere using XYZ after my cluster is up and running - depending on
XYZ and the number of nodes such an
@everywhere using XYZ can actually take ages. If you dare try
@everywhere using DataFrames on 200+ workers. Performance is miserable probably because the 200 workers want to load the very same files at the same time.
For a lot of packages this scheme works nicely. However when
MLJ (my preferred machine learning package):
using ClusterManagers using MLJ ClusterManagers.init_worker("oo")
I get an error:
**ERROR:** LoadError: AssertionError: isempty(PGRP.refs) Stacktrace:  **init_worker(** ::String, ::Distributed.DefaultClusterManager **)** at **/buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/cluster.jl:376** ...
This is because also
MLJ is trying to do something with workers and populates the
Is there a way to load
MLJ but not initiate any parallel functionality?