Allocating OnlineStats on the stack

I’m using OnlineStats to calculate a likelihood function, so I’m creating and destroying a LogSumExp for each row. LogSumExps, like all online stats, are mutable structs, so heap-allocated in general, but I thought that since the LogSumExp is not visible outside the function the optimizer would get rid of those allocations as discussed here. However, the code in the MWE below prints 0.000000 seconds (1 allocation: 112 bytes). Since this happens once per row, it’s a significant performance penalty, any suggestions?

MWE:

using OnlineStats

function logsumexp(vals)
    result = LogSumExp()

    for val in vals
        fit!(result, val)
    end

    x = value(result)

    return x
end

function main()
    logsumexp([1, 2, 3, 4, 5, 6])
    @time logsumexp([1, 2, 3, 4, 5, 6])
end

main()

Actual example code: DiscreteChoiceModels.jl/src/mnl.jl at main · mattwigway/DiscreteChoiceModels.jl · GitHub

If I understand correctly, you are running your logsumexp multiple times inside of a loop, thus being penalized by its allocations.

The options I can think of are:

  1. Rework your logsumexp to take a pre-allocated buffer LogSumExp and zero it out each time at the start of the function.
  2. Try this experimental bumper allocator GitHub - MasonProtter/Bumper.jl: Bring Your Own Stack (it will probably require some modification of how logsumexp is set up).

I do not have domain knowledge here, so I might be missing a more obvious domain-specific solution.

So, in my example code, I just realized the allocation is coming from creating the Vector. That’s not the case in the actual code, I will work on making a better MWE.