Slow parallel for loop

Zhilong_Liu · April 11, 2017, 1:57am

Hi everybody,

I have a problem when using parallelization in a for loop. The computation speed is extremely slow, even slower than without parallel. The for loop code is as follows:

  N = 20
  d_u = 0.5*rand(N)
  d_v = 0.5*rand(N)
  cost_update = SharedArray(Float64, N)

@parallel for k in eachindex(d_u)
    a = p_1     # dynamic lower bound
    b = p_2     # dynamic upper bound
    x_1 = a + (1-gr)*(b-a)
    x_2 = a + gr*(b-a)

    # run golden section search
    cost = zeros(2,1)
    while norm(b-a) > tol

      # compute cost for upper and lower bounds
      lambda_12 = [norm(x_1-p_1) / l_0;
                   norm(x_2-p_1) / l_0]
      theta_gnd = [atan2(x_1[2]-p_i[2], x_1[1]-p_i[1]);   # backward propagation
                   atan2(x_2[2]-p_i[2], x_2[1]-p_i[1])]
      x_gnd = [x_1[1]; x_2[1]] - vertex_min[1] + 1
      y_gnd = [x_1[2]; x_2[2]] - vertex_min[2] + 1
      cost  = [norm(x_1-p_i); norm(x_2-p_i)] .*
              cost_profile_wind(x_wf,  y_wf,  u_wf,  v_wf, d_u[k], d_v[k],
                                x_gnd, y_gnd, V_gnd, theta_gnd) +
              lambda_12.*u_2 + (1-lambda_12).*u_1

     # update upper or lower bounds as necessary
     if cost[1] < cost[2]
       b = x_2
       x_2 = x_1
       x_1 = a + (1-gr)*(b-a)
     else
       a = x_1
       x_1 = x_2
       x_2 = a + gr*(b-a)
     end

    end

    # update cost to return
    cost_update[k] = mean(cost)

  end

The variables x_wf, y_wf, u_wf, v_wf defines a vector field of wind. Inside the for loop, there is a while loop, as well as a function called cost_profile_wind() computing a cost due to wind.

Can anybody help me to get it run faster? If more code is needed, I can always provide.

Thanks!

Ralph_Smith · April 11, 2017, 3:07am

The manual (under “Parallel Computing”) says

Any variables used inside the parallel loop will be copied and broadcast to each process.

This would seem to apply to the (presumably large) x_wf etc. Putting them in shared arrays may help, especially if your worker processes share memory.
Could someone tell us (or better yet, point to a practical way of determining) how often the copy/broadcast operation occurs? It may be once per while iteration - after all, the compiler doesn’t know if they are modified by the cost_profile_wind function.

Topic		Replies	Views
Parallelizing for loop in the computation of a gradient Performance question	19	2560	February 26, 2018
How to code faster parallel for loop New to Julia parallel	8	7261	January 18, 2019
Distributed for loop slower than serial? Julia at Scale	4	1185	August 20, 2018
Need help understanding how to run a for loop in parallel General Usage parallel	2	446	July 27, 2020
Parallel code seems slow Performance	3	1421	October 20, 2017

Slow parallel for loop

Related topics