Automatic Differentiation and Parallelism



I am trying to optimize a function that is expensive to evaluate but has many component parts that can be computed in parallel. However, DualNumbers are not bits-types and therefore cannot be stored in SharedArrays. Is there anything that can be done analogously to using SharedArrays to avoid reallocating and distributing memory between many processors for intermediate steps that contain DualNumbers?

As an example, let’s say we wanted to minimize f(X,theta) with respect to theta, where X is a large SharedArray not modified by f. (We use a SharedArray to avoid the overhead of passing X to all of the processors many times.)

As an example: let f(X,theta)= sum_k h(x_k,g(X,theta)), where x_k is the k-th row of X. It is clear that once g=g(X,theta) is computed that f(X,theta) = sum_k h(x_k,g) is embarrassingly parallelizeable, so the “obvious” way to evaluate f(X,theta) is to just compute g once and parallelize the computation of sum_k h(x_k,g) along k. However, during optimization using automatic differentiation, g will take on a dual-number. That means that we can’t just stick g in a SharedArray. Instead we have to pass g to each processor, even though g isn’t modified at all during the parallel computation of sum_k h(x_k,g).

Does anyone have a suggestion for a good way to avoid reallocating and passing g around many times in such a case, especially since it isn’t even modified during the parallel computation.

Thank you!


DualNumbers are not bits-types

Assuming you’re talking about DualNumbers from DualNumbers.jl, they are as long as T is a bitstype (you can check with the isbits function).


@tkoolen, thanks for letting me know. I did not know that and just verified that one can, in fact, generate a SharedArray with element-type Dual.

Given this, my current thinking is to implement g as a SharedArray of DualNumbers.Dual{Float64} and then just fill that same vector with either Floats or Duals depending on whether I’m on a gradient or evaluation step. I imagine this would add some unnecessary overhead in doing computations for the “epsilon” part of the Duals even when they are all zero (i.e. in the function evaluation step). However, there may be clever ways to fill a SharedArray{Float64} in function calls and a SharedArray{Dual{Float64}} in gradient calls.

Thanks for the help!