"Optional" in-place output arguments

I AND THIS WORKS.

At least I can see it work if the input arrays are StaticArrays (which is anyway my real usecase. There is a gain in performance, and @code_typed does clearly change length.

using StaticArrays
using BenchmarkTools

struct NotWanted end 
function Base.setindex!(A::NotWanted,X,inds...) end 

function foo!(o1,o2,o3,i1,i2)
        o1[:] = 2*i1
        o2[:] = 3(i1.+i2)
        tmp   = i1.+i2
        o3[:] = 3tmp
        return nothing
end

const n = 100
o1 = Vector{Float64}(undef,n)
o2 = Vector{Float64}(undef,n)
o3 = Vector{Float64}(undef,n)
nw = NotWanted()
i1 = @SVector [randn() for i = 1:n]
i2 = @SVector [randn() for i = 1:n]

@btime foo!(o1,o2,o3,i1,i2)
@btime foo!(nw,o2,o3,i1,i2)
@btime foo!(o1,nw,o3,i1,i2)
@btime foo!(o1,o2,nw,i1,i2)
@btime foo!(o1,nw,nw,i1,i2)
@btime foo!(nw,o2,nw,i1,i2)
@btime foo!(nw,nw,o3,i1,i2)
@btime foo!(nw,nw,nw,i1,i2)

yields

  268.151 ns (0 allocations: 0 bytes)
  181.903 ns (0 allocations: 0 bytes)
  180.184 ns (0 allocations: 0 bytes)
  179.268 ns (0 allocations: 0 bytes)
  98.613 ns (0 allocations: 0 bytes)
  105.870 ns (0 allocations: 0 bytes)
  110.408 ns (0 allocations: 0 bytes)
  16.733 ns (0 allocations: 0 bytes)

The Julia creators comment that “immutables are easier to reason about”: maybe the compiler finds it easier to prove that a stack-allocated immutable variable will never be needed…

Thanks everyone!

1 Like