ForwardDiff.jacobian of softmax vs analytic Jacobian

zxzkja · February 26, 2020, 1:18am

I am trying out automatic differentiation on part of my code in place of the analytic derivative, but ForwardDiff.jacobian of softmax seems to take twice as long as the analytic Jacobian. I’m not sure if this is an expected result or I’m doing something to slow down ForwardDiff. Any feedback is appreciated.

Below is a MWE with data representative of my problem. The loop within the functions is also representative (i.e. compute a list of Jacobians for subsets of a larger array). I tried passing an anonymous function to the jacobian, as well as a function that doesn’t compute softmax in-place (both commented out). I know that pushing to an Array{Any} is not great, but that’s just for the MWE.

using ForwardDiff
using BenchmarkTools

# softmax copied from StatsFuns without convert
function softmax2(x::AbstractArray{T}) where {T<:Real}
    n = length(x)
    u = maximum(x)
    s = 0.
    r = similar(x)
    @inbounds for i = 1:n
        s += (r[i] = exp(x[i] - u))
    end
    invs = inv(s)
    @inbounds for i = 1:n
        r[i] *= invs
    end
    r
end
function softmax2!(r::AbstractArray{T}, x::AbstractArray{T}) where {T<:Real}
    n = length(x)
    length(r) == n || throw(DimensionMismatch("Inconsistent array lengths."))
    u = maximum(x)
    s = 0.
    @inbounds for i = 1:n
        s += (r[i] = exp(x[i] - u))
    end
    invs = inv(s)
    @inbounds for i = 1:n
        r[i] *= invs
    end
    r
end

function jac1(x, beta)
    result = Array{Any}(undef, 0)
    for i in 1:10:(size(x)[1] - 9)
        thisx = x[i:(i + 9),:]
        pp!(y::Vector, b::Vector) = softmax2!(y, thisx * b)
        # pp! = (y, b) -> softmax2!(y, thisx * b)
        p = zeros(Float64, 10)
        lambda = ForwardDiff.jacobian(pp!, p, beta)::Array{Float64, 2}
        # pp = b -> softmax2(thisx * b)
        # 𝐩 = pp(beta)
        # lambda = ForwardDiff.jacobian(pp, beta)
        push!(result, lambda)
    end
    return result
end
function jac2(x, beta)
    result = Array{Any}(undef, 0)
    for i in 1:10:(size(x)[1] - 9)
        thisx = x[i:(i + 9),:]
        p = softmax2(thisx * beta)
        lambda = similar(thisx)
        for k in 1:length(beta)
            last_term = 0.
            for s in 1:10
                last_term += thisx[s, k] * p[s]
            end
            for t in 1:10
                lambda[t, k] = p[t] * (thisx[t, k] - last_term)
            end
        end
        push!(result, lambda)
    end
    return result
end

X = [0.398106 0.496961; -0.612026 -0.224875; 0.34112 -1.11714;
     -1.12936 -0.394995; 1.43302 1.54983; 1.9804 -0.743514; -0.367221 -2.33171;
     -1.04413 0.812245; 0.56972 -0.501311; -0.135055 -0.510887;
     2.40162 -1.21536; -0.03924 -0.0225586; 0.689739 0.701239;
     0.0280022 -0.587482; -0.743273 -0.606728; 0.188792 1.09664;
     -1.80496 -0.24751; 1.46555 -0.159902; 0.153253 -0.625778;
     2.17261 0.900435]
β = [0.924499; 0.869371]
X = repeat(X, outer = (5, 2))
β = repeat(β, 2)

display(isapprox(jac1(X, β), jac2(X, β)))

@btime jac1(X, β)
@btime jac2(X, β)

Output

true  8.137 μs (94 allocations: 25.50 KiB)
  4.086 μs (44 allocations: 12.22 KiB)

Topic		Replies	Views
Automatic Differentiation Slow (Slower than Finite-Differences) Optimization (Mathematical)	18	3415	May 31, 2018
Automatic differentiation performance & computing derivatives of only a subset of the arguments Performance question , autodiff	6	909	October 2, 2021
Automatic differentiation for higher-order derivatives General Usage forwarddiff	11	2410	June 7, 2021
Create ForwardDiff.Dual is slow Specific Domains package	5	83	November 26, 2024
Jacobian calculation with FastDifferentiation.jl Specific Domains differentiation	2	155	August 23, 2024

ForwardDiff.jacobian of softmax vs analytic Jacobian

Related topics