Multiple Computer Example


#1

Does anybody have a really simple example of a local host farming out a task to another ssh-connected computer (with julia on it)?

think

@everywhere function abc(n::Int)
      sum=0; for i=1:n; sum+=i; end;#for
      return ( readstring(`hostname`), sum )
end#function

hosts = [ "localhost", "friend.ucla.edu" ]

println( pmap( i -> abc(i), 1:1000, hosts ) )

The docs look a bit overwhelming on the subject.

/iaw


#2

From:

Julia can be started in parallel mode with either the -p or the --machinefile options. -p n will launch an additional n worker processes, while --machinefile file will launch a worker for each line in file file. The machines defined in file must be accessible via a passwordless ssh login, with Julia installed at the same location as the current host. Each machine definition takes the form [count*][user@]host[:port] [bind_addr[:port]]. user defaults to current user, port to the standard ssh port. count is the number of workers to spawn on the node, and defaults to 1. The optional bind-to bind_addr[:port] specifies the ip-address and port that other workers should use to connect to this worker.


#3

thanks salchipapa. it doesn’t like me. the installation is exactly at the same spot in both machines.

$ julia --machinefile machinefile
ssh: Could not resolve hostname 5: nodename nor servname provided, or not known
ERROR: Unable to read host:port string from worker. Launch command exited with error?
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:236
connect(::Base.Distributed.SSHManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:391
create_worker(::Base.Distributed.SSHManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.SSHManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335
Stacktrace:

reading the docs, it seems like all I need to do is to put into the machinefile

5  164.67.165.22

and I should be ready to go. (IP was made up.) is this a correct format (5 processes to be started up on 164.67.165.22.)

password-less and username-less ssh works just fine:

> ssh 164.67.165.22 '/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -e "println(\"hello\")"'
hello

back to home

does it need another open port? anything else? what is the simplest way to check what this means?

regards,

/iaw


#4

Try something like this for now:

julia> addprocs([("machine1", 2), ("machine2", 1)])

This will launch 2 workers on machine1 and 1 worker on machine2.


#5

great. this works. it is incompatible with the -p julia switch, but works fine without it. I can add not only localhost, but do plain addprocs(), too. so I am pretty much all set. thank you.

PS: If someone has a working plain machinefile, I am curious what it should have looked like.


#6

I remember now (haven’t used it in a while), it seems that either the documentation is wrong, or that option is not working as intended, try:

machinefile:

164.67.165.22
164.67.165.22
164.67.165.22
164.67.165.22
164.67.165.22

instead of:

5 164.67.165.22

Then:

julia -p 5 --machinefile machinefile

That worked for me last time! This should give you 10 workers 5 local and 5 remote.


#7

the machinefile version does not work on my end, at all.

bash$ \julia -p 3 --machinefile machinefile
ERROR: connect: connection refused (ECONNREFUSED)ERROR: ERROR:
Stacktrace:connect: connection refused (ECONNREFUSED)connect: connection refused (ECONNREFUSED)

Stacktrace:
Stacktrace: [1] try_yieldto
( [1] try_yieldto
( [1] try_yieldto::(Base.##296#297{Task}, ::Base.##296#297{Task}, ::Base.##296#297{Task}, ::Task):: at Task./event.jl:189)
 at  [2] ./event.jl:189wait
( [2] )wait at ::(./event.jl:234Task)) at

and tons more output. the addprocs works fine.

regards,

/iaw