I’d like to get some input on how to organize my code to run on a computer cluster. I have programmed the computation I want to perform on a file main.jl. The contents of main.jl would be something like this:
#! main.jl
using Pkg
Pkg.activate(".")
Pkg.instantiate()
using Random
using Folds
Random.seed!(123)
const R = 5
simulate_data() = rand(100_000, 100)
function auxiliarymodel(data)
xx = data[:, 2:end]
y = data[:, 1]
return xx\y
end
likelihood(x) = sum(x)
function func(R, params)
function sdatamodel(i)
sdata = simulate_data()
return auxiliarymodel(sdata)
end
avgb = Folds.mapreduce(sdatamodel, +, 1:R, DistributedEx(); init=zero(params)) ./ R
return likelihood(avgb)
end
func(50, ones(99)) |> println
I’m defining and calling functions in main.jl. Those functions are mostly serial, but the function func is costly, and that is the one I’d like to computing using multiple processors.
I also program another file multiproc.jl that calls main.jl and sets up the multiprocessor computation.
#! multiproc.jl
using Distributed
addprocs()
@everywhere include("main.jl")
My idea is to start julia and execute multiproc.jl:
$ julia -p 10 multiproc.jl
Is this a good organization of the code? and most importantly, would this achieve what I want?