I recently went through the same learning process on my university’s cluster, and your second question was the hardest part for me. I found the easiest way was to submit to one of the queues using a PBS submit script that tells Julia where to find all the processors via the PBS nodefile. Here’s a minimum working example PBS script:
#!/bin/bash
#PBS -l nodes=4:ppn=12,walltime=00:05:00
#PBS -N test_julia
#PBS -q debug
echo PBS: node file is $PBS_NODEFILE
julia --machinefile=$PBS_NODEFILE /path/to/your/home/dir/test_julia.jl
echo "finished"
I called this file test_julia.pbs
. When you submit this job (i.e., by running qsub test_julia.pbs
from your login prompt) the Julia process starts up with all the processors (48 in this case) available, as if you’d started it on your laptop with julia -p 2
or whatever. For completeness, here’s test_julia.jl
, which has minimum working examples for basic batch-processing tasks:
println("Hello from Julia")
np = nprocs()
println("Number of processes: $np")
for i in workers()
host, pid = fetch(@spawnat i (gethostname(), getpid()))
println("Hello from process $(pid) on host $(host)!")
end
tasks = randn(np * 30)
@everywhere begin
function foo(x)
return x * 4
end
end
results = pmap(foo, tasks)
println(round(results, 3))
for i in workers()
rmprocs(i)
end
Hope that helps!