One way of solving this kind of problem is as follows. Instead of preallocating stuff manually, we’ll first create a function make_buffer
that knows how to create an appropriate work buffer for an in-place function given the function and its input argument. For f!
, that would be:
make_buffer(::typeof(f!), p) = similar(p, length(x0))
This will use the element type of p
to create the right type of work buffer. We can then use this to create a function that turns an in-place function into an out-of-place function:
function make_out_of_place_func(f!, x)
y = make_buffer(f!, x)
x -> f!(y, x)
end
An out-of-place function created using make_out_of_place_func
only accepts one element type for x
, but we want a solution that accepts generic x
s, so that it works for ForwardDiff.Dual
as well. One way to do that is as follows:
struct OutOfPlace{F}
f!::F
out_of_place_funcs::Dict{Type, Function}
end
OutOfPlace(f!) = OutOfPlace(f!, Dict{Type, Function}())
eval_f(f::F, x) where {F} = f(x) # function barrier
function (oop::OutOfPlace{F})(x) where {F}
T = eltype(x)
f = get!(() -> make_out_of_place_func(oop.f!, x), oop.out_of_place_funcs, T)
eval_f(f, x)
end
const f = OutOfPlace(f!)
i.e., store the in-place function in a struct
, along with a dictionary that maps element types of x
to out-of-place functions, where a new out-of–place function is automatically generated if the element type of x
has not been seen before, but the out-of-place function is reused if a suitable one has already been created. This is pretty efficient:
using BenchmarkTools, Random
@btime f!(out, p) setup = begin # 199.650 μs (0 allocations: 0 bytes)
out = similar($x0)
p = rand!(similar(p0))
end
@btime f(p) setup = p = rand!(similar(p0)); # 220.685 μs (1 allocation: 32 bytes)
We can now create an in-place Jacobian function based on the out-of-place version of function f
:
# derivative just uses `f`, not `f!`
g! = (outjac, p) -> ForwardDiff.jacobian!(outjac, f, p)
@btime g!(outjac, p) setup = begin
outjac = Array{Float64}(undef, length(x0), length(p0))
p = rand!(similar(p0))
end # 388.684 μs (3 allocations: 208 bytes) (allocations would scale with size of `p`)
and then use the same trick again:
# same trick to turn g! into an out-of-place function
make_buffer(::typeof(g!), p) = similar(p, (length(x0), length(p)))
const g = OutOfPlace(g!)
h! = (outjac2, p) -> ForwardDiff.jacobian!(outjac2, g, p)
@btime h!(outjac2, p) setup = begin # 1.637 ms (6 allocations: 576 bytes)
outjac2 = Array{Float64}(undef, length(x0) * length(p0), length(p0))
p = rand!(similar(p0))
end
I’ve done something similar before in RigidBodyDynamics.jl/caches.jl at master · JuliaRobotics/RigidBodyDynamics.jl · GitHub
Full code:
using ForwardDiff
using LinearAlgebra
const x0 = randn(50_000);
const p0 = randn(2);
f!(out, p) = @. out = p[1] * exp(-x0 * p[2])
make_buffer(::typeof(f!), p) = similar(p, length(x0))
function make_out_of_place_func(f!, x)
y = make_buffer(f!, x) # make_buffer overloads will be defined below
x -> f!(y, x)
end
struct OutOfPlace{F}
f!::F
out_of_place_funcs::Dict{Type, Function}
end
OutOfPlace(f!) = OutOfPlace(f!, Dict{Type, Function}())
eval_f(f::F, x) where {F} = f(x) # function barrier
function (oop::OutOfPlace{F})(x) where {F}
T = eltype(x)
f = get!(() -> make_out_of_place_func(oop.f!, x), oop.out_of_place_funcs, T)
eval_f(f, x)
end
const f = OutOfPlace(f!)
using BenchmarkTools, Random
@btime f!(out, p) setup = begin # 199.650 μs (0 allocations: 0 bytes)
out = similar($x0)
p = rand!(similar(p0))
end
@btime f(p) setup = p = rand!(similar(p0)); # 220.685 μs (1 allocation: 32 bytes)
# derivative just uses `f`, not `f!`
g! = (outjac, p) -> ForwardDiff.jacobian!(outjac, f, p)
@btime g!(outjac, p) setup = begin
outjac = Array{Float64}(undef, length(x0), length(p0))
p = rand!(similar(p0))
end # 388.684 μs (3 allocations: 208 bytes) (allocations would scale with size of `p`)
# same trick to turn g! into an out-of-place function
make_buffer(::typeof(g!), p) = similar(p, (length(x0), length(p)))
const g = OutOfPlace(g!)
h! = (outjac2, p) -> ForwardDiff.jacobian!(outjac2, g, p)
@btime h!(outjac2, p) setup = begin # 1.637 ms (6 allocations: 576 bytes)
outjac2 = Array{Float64}(undef, length(x0) * length(p0), length(p0))
p = rand!(similar(p0))
end