How to avoid large movement of data when using remotecall_fetch?

I wrote a code to solve 10k linear programming problems in parallel. I timed this code on a computer with nprocs()=4. Ideally, it should be three times faster than non-parallel code, however, it is only two times faster. I think the problem is in here.

remotecall_fetch(fun_sim_optimal_assignment, p, w_sim_column[t,idx])

The argument w_sim_column[t,idx] is a long vector with a length of 250k. So, I doubt the transmission of data took too much time. Is there a way to circumvent this data movement?

I have one “violent” idea. Because w_sim_column could be generated by just a couple of parameters, I am thinking to write a new function to generate w_sim_column and load it in every process. Then, will Julia automatically use the w_sim_column in the local process to compute the problem?

# This is the code for parallel computing
function fun_H_sim_parallel(fun_sim_optimal_assignment::Function, w_sim_column::Matrix{Vector{Float64}}, num_simulation::Int64, T::Int64, N::Int64)
    H_sim_temp = Matrix{Matrix{Int64}}(undef, T, num_simulation)
    np = nprocs()
    for t = 1:T
        i = 1
        nextidx() = (idx=i; i+=1; idx)
        @sync begin
            for p = 1:np
                if p != myid() || np == 1
                    @async begin
                        while true
                            idx = nextidx()
                            if idx > num_simulation
                            H_sim_temp[t,idx] = remotecall_fetch(fun_sim_optimal_assignment, p, w_sim_column[t,idx])
    return H_sim_temp
# This is the function being called to solve the linear programming problem
@everywhere function fun_sim_optimal_assignment(w_sim::Vector{Float64})
    N = convert(Int64, sqrt(length(w_sim)))
    w_sim = reshape(w_sim, N, N)
    model_sim = Model(optimizer_with_attributes(Gurobi.Optimizer, "Presolve" => 0, "Method" => 1, "OutputFlag" => 0))
    @variable(model_sim, H_sim_temp[1:N, 1:N] >= 0)
    @expression(model_sim, row_sum_sim[i = 1:N], sum(H_sim_temp[i, j] for j = 1:N))
    @expression(model_sim, column_sum_sim[j = 1:N], sum(H_sim_temp[i, j] for i = 1:N))
    @constraint(model_sim, row_constraint_sim[i = 1:N], row_sum_sim[i] == 1)
    @constraint(model_sim, column_constraint_sim[j = 1:N], column_sum_sim[j] == 1)
    @objective(model_sim, Max, sum(w_sim[i, j] * H_sim_temp[i, j] for i = 1:N, j = 1:N))
    if termination_status(model_sim) == MOI.OPTIMAL
        H_market_sim = value.(H_sim_temp)
        error("The model was not solved correctly.")
    H_star_sim = round.(Int64, H_market_sim) # Allocation matrix
    return H_star_sim