I’d like to get some input on how to organize my code to run on a computer cluster. I have programmed the computation I want to perform on a file main.jl
. The contents of main.jl
would be something like this:
#! main.jl
using Pkg
Pkg.activate(".")
Pkg.instantiate()
using Random
using Folds
Random.seed!(123)
const R = 5
simulate_data() = rand(100_000, 100)
function auxiliarymodel(data)
xx = data[:, 2:end]
y = data[:, 1]
return xx\y
end
likelihood(x) = sum(x)
function func(R, params)
function sdatamodel(i)
sdata = simulate_data()
return auxiliarymodel(sdata)
end
avgb = Folds.mapreduce(sdatamodel, +, 1:R, DistributedEx(); init=zero(params)) ./ R
return likelihood(avgb)
end
func(50, ones(99)) |> println
I’m defining and calling functions in main.jl
. Those functions are mostly serial, but the function func
is costly, and that is the one I’d like to computing using multiple processors.
I also program another file multiproc.jl
that calls main.jl
and sets up the multiprocessor computation.
#! multiproc.jl
using Distributed
addprocs()
@everywhere include("main.jl")
My idea is to start julia and execute multiproc.jl
:
$ julia -p 10 multiproc.jl
Is this a good organization of the code? and most importantly, would this achieve what I want?