Automatic Differentiation and Parallelism

Juser · March 3, 2018, 7:15pm

I am trying to optimize a function that is expensive to evaluate but has many component parts that can be computed in parallel. However, DualNumbers are not bits-types and therefore cannot be stored in SharedArrays. Is there anything that can be done analogously to using SharedArrays to avoid reallocating and distributing memory between many processors for intermediate steps that contain DualNumbers?

As an example, let’s say we wanted to minimize f(X,theta) with respect to theta, where X is a large SharedArray not modified by f. (We use a SharedArray to avoid the overhead of passing X to all of the processors many times.)

As an example: let f(X,theta)= sum_k h(x_k,g(X,theta)), where x_k is the k-th row of X. It is clear that once g=g(X,theta) is computed that f(X,theta) = sum_k h(x_k,g) is embarrassingly parallelizeable, so the “obvious” way to evaluate f(X,theta) is to just compute g once and parallelize the computation of sum_k h(x_k,g) along k. However, during optimization using automatic differentiation, g will take on a dual-number. That means that we can’t just stick g in a SharedArray. Instead we have to pass g to each processor, even though g isn’t modified at all during the parallel computation of sum_k h(x_k,g).

Does anyone have a suggestion for a good way to avoid reallocating and passing g around many times in such a case, especially since it isn’t even modified during the parallel computation.

Thank you!

tkoolen · March 3, 2018, 7:34pm

DualNumbers are not bits-types

Assuming you’re talking about DualNumbers from DualNumbers.jl, they are as long as T is a bitstype (you can check with the isbits function).

github.com

JuliaDiff/DualNumbers.jl/blob/c82597b4e1ef4ee15b10ac6ee85a6d6bcc5f70e3/src/dual.jl#L3-L6


      
          immutable Dual{T<:ReComp} <: Number
              value::T
              epsilon::T
          end

Juser · March 3, 2018, 8:30pm

@tkoolen, thanks for letting me know. I did not know that and just verified that one can, in fact, generate a SharedArray with element-type Dual.

Given this, my current thinking is to implement g as a SharedArray of DualNumbers.Dual{Float64} and then just fill that same vector with either Floats or Duals depending on whether I’m on a gradient or evaluation step. I imagine this would add some unnecessary overhead in doing computations for the “epsilon” part of the Duals even when they are all zero (i.e. in the function evaluation step). However, there may be clever ways to fill a SharedArray{Float64} in function calls and a SharedArray{Dual{Float64}} in gradient calls.

Thanks for the help!

Topic		Replies	Views
Why Don't Shared Arrays Allow Non-Bit-Types? Julia at Scale question	5	1067	December 12, 2017
Are computations with Dual numbers twice as slow as non Duals? Performance forwarddiff , dual	2	429	January 4, 2023
Efficiently in-place filling an array of ForwardDiff.Dual Performance forwarddiff	2	264	March 24, 2023
Avoid repeated memory allocation by defining an Array{Float64} outside of optimize() using Optim.jl, ForwardDiff.jl Performance differentiation , optim , optimization	7	971	April 24, 2020
Create ForwardDiff.Dual is slow Specific Domains package	5	94	November 26, 2024

Automatic Differentiation and Parallelism

Related topics