Allocations with nested Jacobians with StaticArrays and ForwardDiff

I am working a code where we do some nested Jacobians with StaticArrays, however I do not seem to manage to get it done without some allocations due to some internal type instability.

The MWE below shows the allocations, and a weird behavior I have also seen while debugging our larger code, where type stability and allocations disappear if, after benchmarking (or simply calling) my function, I redefine J_flat:

using ForwardDiff, StaticArrays, Chairmarks

@inline f(x) = SVector(x[1]^2 * x[2], sin(x[1]) + x[2]^3)

function J_flat(x) 
    y = ForwardDiff.jacobian(f, x)
    SA[y[1], y[2]]
end

function nested_jacobian(x0::SVector{2, T}) where T
    ForwardDiff.jacobian(J_flat, x0)
end

x0 = SVector(1.0, 2.0)
@code_warntype nested_jacobian(x0) # shows a type instability
@b nested_jacobian($x0) # 131.447 ns (4 allocs: 176 bytes)

function J_flat(x) 
    y = ForwardDiff.jacobian(f, x)
    SA[y[1], y[2]]
end

@code_warntype nested_jacobian(x0) # type instability is gone
@b nested_jacobian($x0) # 26.594 ns

Redefining J_flat without benchmarking or calling nested_jacobian does not result in allocation-free call

using ForwardDiff, StaticArrays, Chairmarks

@inline f(x) = SVector(x[1]^2 * x[2], sin(x[1]) + x[2]^3)

function J_flat(x) 
    y = ForwardDiff.jacobian(f, x)
    SA[y[1], y[2]]
end

function nested_jacobian(x0::SVector{2, T}) where T
    ForwardDiff.jacobian(J_flat, x0)
end

x0 = SVector(1.0, 2.0)

function J_flat(x) 
    y = ForwardDiff.jacobian(f, x)
    SA[y[1], y[2]]
end

@code_warntype nested_jacobian(x0) # shows a type instability
@b nested_jacobian($x0) # 129.829 ns (4 allocs: 176 bytes)

Is there any workaround to avoid these allocations?

I think there have been a few threads on this over the years, but the only one I can find at the moment is my own from quite a while ago now: Type stability for higher derivatives in ForwardDiff

As best as I’m aware there is currently no easy workaround for this. However, if some manual intervention is feasible in your case, then you can do something like this:

f(x) = SVector(x[1]^2 * x[2], sin(x[1]) + x[2]^3)

function J_flat(x)  
    y1 = ForwardDiff.derivative(x1 -> f(SA[x1, x[2]])[1], x[1])
    y2 = ForwardDiff.derivative(x1 -> f(SA[x1, x[2]])[2], x[1])

    SA[y1, y2]
end

function nested_jacobian(x0::SVector{2, T}) where T
    ForwardDiff.jacobian(J_flat, x0)
end

x0 = SVector(1.0, 2.0)
@code_warntype nested_jacobian(x0) # Is now type stable

Edit: Forgot to get first index of the output of f
Edit 2: Got the wrong indices I think. I guess this showcase why doing this manually is less than ideal.

I’m having the same issue when computing Lie brackets (which means the issue also exists in OptimalControl.jl). In your case, I found that simply specifying the input/output of J_flat seems to fix the issue.

using ForwardDiff, StaticArrays, BenchmarkTools

@inline f(x) = SVector(x[1]^2 * x[2], sin(x[1]) + x[2]^3)

function J_flat(x::SVector{2,T}) where T
    y = ForwardDiff.jacobian(f, x)
    SVector{2,T}(y[1], y[2])
end

function nested_jacobian(x0)
    return ForwardDiff.jacobian(J_flat, x0)
end

x0 = SVector(1.0, 2.0)

@code_warntype nested_jacobian(x0)  # no type instability
@btime nested_jacobian($x0)  # 9.781 ns (0 allocations: 0 bytes)

However, for my case of Lie brackets, this doesn’t work, so this might not work outside of your MWE. I’ll share my example below.

Maybe you could add the tags forwarddiff and staticarrays (maybe even autodiff) to help experts find this discussion.

On your example, something odd is that using @code_llvm instead of @code_warntype, we can also detect that after the redefinition, the inner functions of ForwardDiff/partials.jl get called more often than before.

Just in case someone can parse them, here are the traces of @code_llvm before and after redefining J_flat in my code above.

;  @ REPL[4]:1 within `nested_jacobian`
define nonnull {}* @julia_nested_jacobian_324([1 x [2 x double]]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %0) #0 {
top:
  %1 = alloca [3 x {}*], align 8
  %gcframe10 = alloca [4 x {}*], align 16
  %gcframe10.sub = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe10, i64 0, i64 0
  %.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 0
  %2 = bitcast [4 x {}*]* %gcframe10 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 16 %2, i8 0, i64 32, i1 true)
  %3 = alloca [1 x [2 x { double, [1 x [2 x double]] }]], align 8
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #9
  %tls_ppgcstack = getelementptr i8, i8* %thread_ptr, i64 -8
  %4 = bitcast i8* %tls_ppgcstack to {}****
  %tls_pgcstack = load {}***, {}**** %4, align 8
;  @ REPL[4]:2 within `nested_jacobian`
; β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:66 within `jacobian`
; β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:91 within `vector_mode_jacobian`
    %5 = bitcast [4 x {}*]* %gcframe10 to i64*
    store i64 8, i64* %5, align 16
    %6 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe10, i64 0, i64 1
    %7 = bitcast {}** %6 to {}***
    %8 = load {}**, {}*** %tls_pgcstack, align 8
    store {}** %8, {}*** %7, align 8
    %9 = bitcast {}*** %tls_pgcstack to {}***
    store {}** %gcframe10.sub, {}*** %9, align 8
    call void @j_static_dual_eval_326([1 x [2 x { double, [1 x [2 x double]] }]]* noalias nocapture noundef nonnull sret([1 x [2 x { double, [1 x [2 x double]] }]]) %3, [1 x [2 x double]]* nocapture nonnull readonly %0)
    %ptls_field11 = getelementptr inbounds {}**, {}*** %tls_pgcstack, i64 2
    %10 = bitcast {}*** %ptls_field11 to i8**
    %ptls_load1213 = load i8*, i8** %10, align 8
    %box = call noalias nonnull dereferenceable(64) {}* @ijl_gc_pool_alloc(i8* %ptls_load1213, i32 896, i32 64) #6
    %11 = bitcast {}* %box to i64*
    %12 = getelementptr inbounds i64, i64* %11, i64 -1
    store atomic i64 133696916920400, i64* %12 unordered, align 8
    %13 = bitcast {}* %box to i8*
    %14 = bitcast [1 x [2 x { double, [1 x [2 x double]] }]]* %3 to i8*
    call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef align 8 dereferenceable(48) %13, i8* noundef nonnull align 8 dereferenceable(48) %14, i64 48, i1 false)
    %15 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe10, i64 0, i64 3
    store {}* %box, {}** %15, align 8
    %ptls_load81415 = load i8*, i8** %10, align 8
    %box3 = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc(i8* %ptls_load81415, i32 800, i32 32) #6
    %16 = bitcast {}* %box3 to i64*
    %17 = getelementptr inbounds i64, i64* %16, i64 -1
    store atomic i64 133696928609808, i64* %17 unordered, align 8
    %18 = bitcast {}* %box3 to i8*
    %19 = bitcast [1 x [2 x double]]* %0 to i8*
    call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef align 8 dereferenceable(16) %18, i8* noundef nonnull align 8 dereferenceable(16) %19, i64 16, i1 false)
    %20 = getelementptr inbounds [4 x {}*], [4 x {}*]* %gcframe10, i64 0, i64 2
    store {}* %box3, {}** %20, align 16
    store {}* inttoptr (i64 133696925208208 to {}*), {}** %.sub, align 8
    %21 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 1
    store {}* %box, {}** %21, align 8
    %22 = getelementptr inbounds [3 x {}*], [3 x {}*]* %1, i64 0, i64 2
    store {}* %box3, {}** %22, align 8
    %23 = call nonnull {}* @ijl_apply_generic({}* inttoptr (i64 133696871410736 to {}*), {}** nonnull %.sub, i32 3)
    %24 = load {}*, {}** %6, align 8
    %25 = bitcast {}*** %tls_pgcstack to {}**
    store {}* %24, {}** %25, align 8
; β””β””
  ret {}* %23
}
;  @ REPL[4]:1 within `nested_jacobian`
define void @julia_nested_jacobian_343([1 x [4 x double]]* noalias nocapture noundef nonnull sret([1 x [4 x double]]) align 8 dereferenceable(32) %0, [1 x [2 x double]]* nocapture noundef nonnull readonly align 8 dereferenceable(16) %1) #0 {
top:
  %2 = alloca [2 x double], align 8
;  @ REPL[4]:2 within `nested_jacobian`
; β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:66 within `jacobian`
; β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:91 within `vector_mode_jacobian`
; β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:24 within `static_dual_eval`
; β”‚β”‚β”‚β”Œ @ REPL[8]:2 within `J_flat`
; β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:66 within `jacobian`
; β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:91 within `vector_mode_jacobian`
; β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:24 within `static_dual_eval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ REPL[2]:1 within `f`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:693 within `sin`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:703 within `sincos`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ tuple.jl:92 within `indexed_iterate` @ tuple.jl:92
             %3 = getelementptr inbounds [2 x double], [2 x double]* %2, i64 0, i64 0
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ tuple.jl:92 within `indexed_iterate`
             %4 = getelementptr inbounds [2 x double], [2 x double]* %2, i64 0, i64 1
; β””β””β””β””β””β””β””β””β””β””β””
  %newstruct13.sroa.0.0..sroa_idx = getelementptr inbounds [1 x [4 x double]], [1 x [4 x double]]* %0, i64 0, i64 0, i64 0
  %newstruct13.sroa.2.0..sroa_idx14 = getelementptr inbounds [1 x [4 x double]], [1 x [4 x double]]* %0, i64 0, i64 0, i64 1
  %newstruct13.sroa.3.0..sroa_idx15 = getelementptr inbounds [1 x [4 x double]], [1 x [4 x double]]* %0, i64 0, i64 0, i64 2
; β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:66 within `jacobian`
; β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:91 within `vector_mode_jacobian`
; β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:24 within `static_dual_eval`
; β”‚β”‚β”‚β”Œ @ REPL[8]:2 within `J_flat`
; β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:66 within `jacobian`
; β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:91 within `vector_mode_jacobian`
; β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/ext/ForwardDiffStaticArraysExt.jl:24 within `static_dual_eval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ REPL[2]:1 within `f`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 @ intfuncs.jl:332
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
            %5 = bitcast [1 x [2 x double]]* %1 to <2 x double>*
            %6 = load <2 x double>, <2 x double>* %5, align 8
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:578
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:423 within `*` @ float.jl:411
            %7 = insertelement <2 x double> %6, double 2.000000e+00, i64 0
            %8 = fmul <2 x double> %6, %7
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:579
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
               %9 = fmul <2 x double> %8, <double 0.000000e+00, double 3.000000e+00>
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:579 within `literal_pow`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                    %10 = extractelement <2 x double> %9, i64 0
                    %11 = fadd double %10, 2.000000e+00
                    %12 = fadd double %10, 0.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                     %13 = extractelement <2 x double> %6, i64 1
                     %14 = fmul double %13, %11
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                     %15 = fadd double %10, %14
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:578 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:578
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:423 within `*` @ float.jl:411
            %16 = fmul double %13, 2.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:577 @ intfuncs.jl:332
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
            %17 = insertelement <2 x double> %6, double %16, i64 1
            %18 = insertelement <2 x double> %6, double 3.000000e+00, i64 1
            %19 = fmul <2 x double> %17, %18
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                     %20 = fmul <2 x double> %19, zeroinitializer
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                     %21 = fadd <2 x double> %9, %20
                     %22 = extractelement <2 x double> %21, i64 0
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:487 within `+` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:80
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:207 within `add_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                   %23 = fadd double %22, %15
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:693 within `sin`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:703 within `sincos`
            %24 = extractelement <2 x double> %6, i64 0
            call void @j_sincos_345([2 x double]* noalias nocapture noundef nonnull sret([2 x double]) %2, double %24)
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:704 within `sincos`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:407 within `-`
             %unbox7 = load double, double* %3, align 8
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                     %25 = insertelement <2 x double> %6, double %unbox7, i64 0
                     %26 = insertelement <2 x double> <double 0.000000e+00, double poison>, double %12, i64 1
                     %27 = fmul <2 x double> %25, %26
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:694 within `sin`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                    %unbox8 = load double, double* %4, align 8
                    %28 = fmul double %unbox8, 0.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                    %29 = fsub double %28, %unbox7
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                     %30 = insertelement <2 x double> poison, double %28, i64 0
                     %31 = shufflevector <2 x double> %30, <2 x double> %8, <2 x i32> <i32 0, i32 2>
                     %32 = fadd <2 x double> %31, %27
                     %33 = fsub <2 x double> %31, %27
                     %34 = shufflevector <2 x double> %32, <2 x double> %33, <2 x i32> <i32 1, i32 2>
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:578 within `literal_pow` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:579
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
               %35 = fmul double %16, 0.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:578 within `literal_pow`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:296 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:215 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ promotion.jl:423 within `*` @ float.jl:411
                 %36 = fmul double %35, 3.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                     %37 = fmul <2 x double> %9, zeroinitializer
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:579 within `literal_pow`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:83 within `*` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:110
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:199 within `scale_tuple`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:411 within `*`
                    %38 = fmul double %36, 0.000000e+00
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                    %39 = extractelement <2 x double> %37, i64 1
                    %40 = fadd double %39, %38
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                     %41 = fadd <2 x double> %37, %20
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:487 within `+` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:80
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:207 within `add_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:487 within `+` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:80
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:207 within `add_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                 %42 = fadd double %29, %40
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β””β””β””β””β””β””β””
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:286 within `*`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:218 within `dual_definition_retval`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:118 within `_mul_partials`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:219 within `mul_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/dual.jl:487 within `+` @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:80
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:207 within `add_tuples`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ /home/tremelow/.julia/packages/ForwardDiff/X74OO/src/partials.jl:156 within `macro expansion`
; β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”‚β”Œ @ float.jl:409 within `+`
                   %43 = fadd <2 x double> %41, %34
; β””β””β””β””β””β””β””β””β””β””β””β””β””β””β””β””β””
  store double %23, double* %newstruct13.sroa.0.0..sroa_idx, align 8
  store double %42, double* %newstruct13.sroa.2.0..sroa_idx14, align 8
  %44 = bitcast double* %newstruct13.sroa.3.0..sroa_idx15 to <2 x double>*
  store <2 x double> %43, <2 x double>* %44, align 8
  ret void
}

Here is my example for Lie brackets

using ForwardDiff, StaticArrays, BenchmarkTools

f(x::SVector{2,T}) where T = SVector{2,T}(x[1] * exp(x[2]), -exp(x[1]))
g(x::SVector{2,T}) where T = SVector{2,T}(exp(0.9 * x[2]), -1.2 * exp(x[1]))
x0 = @SVector [0.2, 0.5]

function fg(x::SVector{2,T}) where T
    fx, jacfx = f(x), ForwardDiff.jacobian(f, x)
    gx, jacgx = g(x), ForwardDiff.jacobian(g, x)
    return SVector{2,T}(jacgx * fx - jacfx * gx)
end

function jacfg(x::SVector{2,T}) where T
    valx = fg(x)
    jacx = ForwardDiff.jacobian(fg, x)
    return valx, jacx
end

@btime jacfg($x0)  # 229.431 ns (6 allocations: 272 bytes)


function fg(x::SVector{2,T}) where T
    fx, jacfx = f(x), ForwardDiff.jacobian(f, x)
    gx, jacgx = g(x), ForwardDiff.jacobian(g, x)
    return SVector{2,T}(jacgx * fx - jacfx * gx)
end

@btime jacfg($x0)  # 58.745 ns (0 allocations: 0 bytes)

To avoid redefining fg, I found an ugly ad-hoc fix by adding the following lines before calling jacfg:

xtmp = @SVector ForwardDiff.Dual{ForwardDiff.Tag{typeof(fg), Float64}, Float64, 2}[ForwardDiff.Dual{ForwardDiff.Tag{typeof(fg), Float64}}(0.2,1.0,0.0), ForwardDiff.Dual{ForwardDiff.Tag{typeof(fg), Float64}}(0.5,0.0,1.0)]
ForwardDiff.jacobian(f, xtmp)
ForwardDiff.jacobian(g, xtmp)

but I wouldn’t know how to generalize this for deeper nesting…

For me, this MWE shows allocations: 44.746 ns (2 allocs: 96 bytes)

Summary
using ForwardDiff, StaticArrays, Chairmarks

@inline f(x::SVector{N, T}) where {N, T} = SVector{N, T}(x[1]^2 * x[2], sin(x[1]) + x[2]^3)

@inline function J_flat(x::SVector{2,T}) where T
    y = ForwardDiff.jacobian(f, x)
    SVector{2,T}(y[1], y[2])
end

function nested_jacobian(x0)
    return ForwardDiff.jacobian(J_flat, x0)
end

x0 = SVector(1.0, 2.0)

@b nested_jacobian($x0) 

Same here, that’s odd. It seems that specifying the signature of f has this impact, since the only difference is f(x::SVector{N,T}) instead of f(x). What’s weird as well is that @code_warntype raises no flags.

After checking, f(x::SVector) does not allocate either, and {N,T} can be removed from J_flat as well. But this does not work in the Lie bracket example.