It might have to do with the broadcasting. The pullback is faster when the function complicated
operates on vectors directly:
complicated_wb(x) = sin.(x) .^2 .* exp.(x) .* x.^(-2) .+ cos.(x) .^ 3
x = 1:0.01:10
>@btime complicated_wb(x)
104.899 μs (1 allocation: 7.19 KiB)
> @btime pullback(complicated_wb,x)[2](ones(length(x)))[1]
331.299 μs (8230 allocations: 380.52 KiB)
> @btime complicated'.(x)
759.201 μs (14420 allocations: 1006.92 KiB)